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Description 

HELD OF THE iNVENTION 

5 rooOl] The present invention relates to isolated polynucleotides that represent a complete gene, or a fragment there- 
of that is expressed. In addition, the present invention relates to the polypeptide or protein corresponding to the coding 
sequence of these polynucleotides. The present invention also relates to isolated polynucleotides that represent reg- 
ulatory regions of genes. The present invention also relates to isolated polynucleotides that represent untranslated 
regions of genes. The present invention further relates to the use of these isolated polynucleotides and polypeptides 

10 and proteins. 

DESCRiPTiON OF THE RELATED ART 

[00021 Efforts to map and sequence the genome of a number of organisms are in progress; a few complete genorne 
sequences, for example those of E. cotf and Saccharomyces cerevisiaears known (Blattner et al.. Science Z7T- 1453 
ri997V Goffeau et al.. Science 274:546 (1996)). The complete genome of a multicellular organism. C. elegans. has 
also been sequenced (See. the'^etegans Sequencing Consortium. Sc/ence 282:201 2 (1998)). To date, no complete 
genome of a plant has been sequenced, nor has a complete cDNA complement of any plant been sequenced. 



IS 



20 SUIWUflARY OF THE INVENTION 

r00031 The present invention comprises polynucleofdes. such as complete cDNA sequences andtor sequences of 
UnoicDNAencompassingcompletegenes. fragments of genes, and/or regulatoryelements of genes and/or^ 

with other functions and/or intergenic regions, hereinafter collectiveV referred to as Sequence-Detemnined DNA F ag- 
25 ments (SDFs). from different plant species, particularly com, wheat, soybean, rice and Arabidopsis thaliar^. and other 
plants and or mutants, variants, fragments or fusions of said SDFs and polypeptides or proteins derived therefrom Jn 
some instances, the SDFs span the entirety of a protein-coding segment. In some instances, the entirety of an mRNA 
is represented, aher objects of the invention that are also represented by SDFs of the invention are control sequences, 
such as, but not limited to, promoters. Complements of any sequence of the invention are also considered part of the 

roowrother objects of the invention are polynucleotides comprising exon sequences, polynucleotides comprising 
ntron sequences, polynucleotides comprising introns together with exons, intron/exon junction sequences. 5 untrans- 
lated sequences, and 3' untranslated sequences of the SDFs of the present invention. Polynucleotides representing 
the joinder of any exons described herein, in any arrangement, for example, to produce a sequence encoding any 

35 desirable amino acid sequence are within the scope of the invention. ^ ,uo, h„hriHi:,o 

roOOSl The present invention also resides in probes useful for isolating and identifying nucleic acids that hybridize 
to an SDF of the invention. The probes can be of any length, but more typically are 12-2000 nucleotides in length; 
more tvpically. 1 5 to 200 nucleotides long; even more typically. 1 8 to 1 00 nucleotides long. . . „ . 

[0006] Yet another object of the invention is a method of isolating and/or identifying nucleic acids using the following 

40 steps: 

(a) contacting a probe of the instant invention with a polynucleotide sample under conditions that pemiit hybridi- 
zation and formation of a polynucleotide duplex; and 

(b) detecting and/or isolating the duplex of step (a). 

r00071 The conditions for hybridization can be from low to moderate to high stringency conditions. The sample can 
include a polynucleotide having a sequence unique in a plant genome. Probes and methods of the invention are useful, 
for example, without limitation, for mapping of genetic traits and/or for positional cloning of a desired fragment of ge- 

50 !Soai° Probes and methods of the invention can also be used for detecting alternatively spliced messages within a 
species Probes and methods of the invention can further be used to detect or isolate related genes in other plant 
species using genomic DNA (gDNA) and/or cDNA libraries. In some instances, especially when longer probes and tow 
to moderate stringency hybridization conditions are used, the probe will hybridize to a plurality of cDNA and^or gDNA 
sequences of a plant. This approach is useful for isolating representatives of gene families which are identifiable by 

55 ptisession of a common f uncttonai domain in the gene product or which have common cis-acting regulatory sequences. 
This approach is also useful for identifying orthotogous genes from other organisms. . ^ , „ 

r0009] The present invention also resides in constructs for modulating the expression of the genes comprised of all 
or a fragment of an SDF The constructs comprise all or a fragment of the expressed SDF. or of a complementary 
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c«n..PnrP Examoles of constructs include ribozymes comprising RNA encoded by an SDF or by a sequence comple- 
Zta" heZS^^^^^^ constr^s comprising coding regions or P-«= thereof .constructs c^^^^^^^^^^ 

orSrs Introns untranslated regions, scaffold attachment regions, melhylating regions, enhancing or [^ducng re 
K S chro^^^ conforrSation modifying sequences, etc. Such constructs can be constructed using vi^. 
pSmid bac^Lartificialchromosom 

Diam Licial chromosomes or other types of vectors and exist in the plant as autonomous replicating sequences or 
^DNA SSrnTthe genome. Xn inserted into a host cel. the construct i^/-^^-^^- 
^h or operatively linked to. a heterologous polynucleotide. For instance, a coding region from an SDF might be 
ooerablv linked to a promoter that is functional in a plant. „ionie 
^MIOl T^e present invention also resides In host cells, including bacterial or yeast cells or plant cells, and plants 
tet ?arbo?con?rScterch as described above. Another aspect of the invention relates to methods for modulating 
e^ristn oTspSenes In plants by expression of the coding sequence of the constructs, by regulation of expres- 
sSoneomore endogenous^ 

Ta pLn . Methods of m^ulation of gene expression include without limtetton (1) inserting ^^^^^^f^^^^^^ 
coDies of a polynucleotide comprising a coding sequence; (2) modulating an endogenous promote in a host cell. (3) 
Sng iSense or ribozyme'constructs into a host cell and (4) inserting into a host cell a polynucleotide compnsing 
Tseienre encoding a vaTant. fragment, or fusion of the native polypeptides of the instant invention. 

BRIEF DESCRIPTION OF THE TABLES 

[0011] The sequences of exemplary SDFs and polypeptides corresponding to the coding 

Mention are described in Reference Tables 1 and 2. REF Tables 1 and 2'; and in Sequence Tables 1 and 2. SEQ 
Tabiri and 2 ■ ^^^^^^^^^ refer to a number of Maximum Length Sequences" or MLS." Ead, MLS corresponds 

to me longesf cDnI obtained, either by cloning or by the prediction from genomic sequence. The sequence of the 
MLS is the cDNA sequence as described in the Av subsection of the REF Tables. 
[001 2] The REF Table includes the foltowing infomfiation relating to each MLS: 

I. cDNA Sequence 

A. 5' UTR 

B. Coding Sequence 

C. 3' UTR 

II. Genomic Sequence 

A. Exons 

B. Introns 

C. Promoters 

III. Link of cDNA Sequences to Clone IDs 

IV. Multiple Transcription Start Sites 

V. Polypeptide Sequences 

A. Signal Peptide 

B. Domains 

C. Related Polypeptides 

VI. Related Polynucleotide Sequences 
I. cDNA SEOUENCE 

r001 31 The REF Tables indicate which sequence in the SEQ Tables represents the sequence of each MLS. The MLS 
Snce c^n "Lprise 5' and 3" UTR as well as coding sequences. In addition, specific cDNA clone numbers also 
are included in the REF Tables when the MLS sequence relates to a specific cDNA clone. 

A. 5' UTR 

[0014] Thelocationolthe5-UTRcanbedetemiinedbycomparingthemost5'MLSsequencewiththecorresponding 
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genomic sequence as indicated in the REF Tables. The sequence that matches, beginning f^^f J^^^^^ 
Srt Is and ending at the last nucleotide before any of the translational start srtes corresponds to the 5 UTR 

B. Coding Region 

' [0015] The coding region is the sequence in any open reading frame found in the MLS. Coding regions of interest 
are indicated in the Poly P SEQ subsection of the REF Tables. 

C. 3" UTR 

r0016l Thelocationofthe3'UTRcanbedeterminedbycomparingthemost3-MLSsequencewilhthe^^^^^^ 
g^noLrequrce as indicated In the REF Tables. The sequence that matches, beginn.ng at the translat.onal stop 
site and ending at the last nucleotide of the MLS corresponds to the 3" UTR. 

IS 11. GENOMIC SEQUENCE 

[0017] Further, the REF Tables indicate the specrfic gi" number of the genomic sequence if [^^^/f^^^^^^ 

««ubl c databank For each genomic sequence, the REF Tables indicate wh^h regions are included in the MLS. These 

regJ^^sctSdeth^^^^^^^^^ 



20 



25 



30 



Region 1 Region 2 Region 3 

- I 5' UTR \ "i^ I "I Exon I- -j Exon I 3-"UTR P 



Translational Stop Codon 

Start Site 



[0018] -rheREFTablesreportthefirstandlastbaseofeachregionthatareincludedinanMLSsequence.Anexample 

is shown below: 
gi No. 47000: 
35 37102 ...37497 

The nfm^ers indlTe that the MLS contains the following sequences from two regbns of gi No. 47000; a first region 
including bases 37102-37497. and a second region including bases 37593-37925. 

40 A. EXON SEQUENCES 

[0019] The location of the exons can be determined by comparing the sequence of the regions from the genomic 
sequences with the corresponding MLS sequence as indicated by the REF Tables. 

45 i. INITIAL EXON 

[00201 To determine the location of the initial exon, information from the 

(1) polypeptide sequence section; 
50 (2) cDNA polynucleotide section: and 

(3) the genomic sequence section 

of the REF Tables are used. First, the polypeptide section will indteate where the translational start site |s Ij^at^ j" 
fhe MLSseauence The MLS sequence can be matched tothe genomic sequence that corresponds to the MLS. Based 
55 c^f the mS beS!een^^^ and corresponding genomic sequences . the location of the translational start s.te can 
TedeteTmrnedrne^^^^^^^^^ 

Tenerally. the test base of the exon of the corresponding genomte regton. in which the translational start site 
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was located, will represent the end of the ^itial exon. In some cases, the initial exon will end with a stop codon. when 

IS^rinTe 'l^: TJlZences representing the MLS are in the pos«ive strand of the cone^^^^^;:^ 
S^ence the last base will be a larger number than the first base. When the sequences f«Pf«s«"<'"9 "^^f. 
Ze'aJ^e st^nd d^^^^^^^^^ Qenomk. sequence, then the last base will be a smaller number than the first 

base. 

II INTERNAL EXONS 

ro0231 Excepttortheregionsthatcomprisethe5'and3'UTRs,lnitialexon.andtemiinalexon,theremaininggenom^^^ 
Es thZatlh ^e MLS sequence are the internal exons. Specr,calfy. the bases defin,ng the boundaries of the 
remaining regions also define the intron/exon junctions of the internal exons. 

III TERMINAL EXON 

[0024] As with the initial exon. the location of the terminal exon is determined with Information from the 

(1) polypeptide sequence section; 

(2) cDNA polynucleotide section; and 

(3) the genomic sequence section 

of the REF Tables The polypeptide sectfon will indicate where the stop codon is located in the MLS sequence^ The 
ML^s^uence can be J^atehed to the corresponding genomic sequence. Based on the match between MLS and 

Kse wtbH laTger number than the first base. When the MLS sequences are .n the negative strand of the 
col?,^ding genomic sequence, then the last base will be a smaller number than the first base. 

B. INTRON SEQUENCES 

rno26l in addition the introns corresponding to the MLS are defined by identifying the genomic sequence located 
Sen the legions w r°he genomic sequence comprises exons. Thus, introns are defined as start-ng one base 
So^strear^ of a genomic region comprising an exon. and end one base upstream from a genomic reg.on comprising 
an exon. 

C. PROMOTER SEQUENCES 

r00271 As indicated below, promoter sequences corresponding to the MLS are defined as sequences "Pstream 
tSt e^on usually, as sequences upstream of the first of multiple transcription M^^^^^^^^ even more usually 
as sequences about 2.000 nucleotides upstream of the first of multiple transcription start srtes. 

Ml LINK ot ePNA SEQUEN CES to CLONE IDs 

r0O281 As noted above, the REF tables identify the cDN A clone(s) that relate to each MLS. The MLS sequence can 
Kger thanTe sequences included in the cDNA clones. In such a case, the REF table 'nd'';^t«!f^Y 9^^^^ 
MLS tSat te included ?n the clone. If erther the 5' or 3' termini of the cDNA clone sequence is the same as the MLS 
sequence, no mention will be made. 

IV. Multiple Tranecriptlon Start Sites 

ro0291 Initiation of transcription can occur at a number of sites of the gene. The REF tables indicate the pos^ble 
Kp e tr^S^n s^for each gene. In the REF tables, the location of the Jtions^. c^ e-the 
a Msitive or neoative number. The positions indicated by positive numbers refer to the transcnption start sites as 
S in the MLtseqIence. The negative numbers indicate the transcriptton start sKe «.thin the genome sequence 
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^[^^"Cti^^^^^^^^ o. the transcriptton star, sites wKh the negatK^e numbers, the MLS seq en^^^^^^^ 

with the cx^rresponding genomic sequence. In the Instances when a public genomic sequence « ref erenced 
SeTeJl «.rrespondTg gen^ic sequence can be found by direct reference to the nucleotide sequence >nd cated 
by t"e7numbers^ownln^L public genomic DNAsectlonof the REF tables. 

the transcription start site is located in the corresponding genomic sequence upstream of the base tha rmtches the 
begSSMLS sequence in the alignment. The negativenumberisrelative torn 

which matches the genomic sequence corresponding to the relevant gi" number w oii„r,«,=r.t 

SSl] in me insta'n ces when no public genomic DNA is referenced, the relevant nucleotide 

Is the nucleotide sequence associated with the amino acid sequence designated by gi" number of the later PolyP SEQ 

subsection. 

V. Polypeptide Sequences 

r00321 The Polyp SEQ subsection lists SEQ ID NOs and Ceres SEQ I D NO for polypeptide sequences corresponding 
toihe coding sequence of the MLS sequence and the location of the translational start site with the coding sequence 

Pa^'-^eTsTequence can have muttiple translational start sites and can be capable of producing more than 
one polypeptide sequence. 

A. Signal Peptide 

r00341 The REF Tables also indicate in subsection (B) the cleavage site of the putative signal peptide of the polypep- 
Side corresponding to thecodingsequence of the MLS sequence. Typically, signal peptidecodingsequencescompnse 

a sequence encoding the first residue of the polypeptide to the cleavage site residue. 

B. Domains 

[00351 Subsection (C) provides infomiation regarding identified domains (where present) within the polypeptide and 
(where present) a name for the polypeptide domain. 

C. Related Pohrpeptldes 

[0036] Subsection (Dp) provides (where present) information concerning amino acid sequences t>\at are found to be 
related and have some percentage of sequence identity to the polypeptide sequences of REF and SEQ TABLES 1 
AND 2. These related sequences are identified by a gi" number. 

VI. Related Pohfnud eotlde Sequences 

[0037] Subsection (Dn) provides polynucleotide sequences (where present) that are related to and have some per- 
centage of sequence identity to the MLS or corresponding genomic sequence. 



Abbreviation 


Description 


Max Len. Seq. 


Maximum Length Sequence 


relto 


Related to 


Clone Ids 


Clone ID numbers 


Pub gDNA 


Public Genomic DNA 


gi No. 


gi number 


Gen. seq. in cDNA 


Genomic Sequence in cDNA (Each region for a single gene prediction is 
listed on a separate line. 




In the case of multiple gene predictions, the group of regions relating to a 
single prediction are separated by a blank line) 


(Ac) cDNA SEQ 


cDNA sequence 
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(continued) 





Abbreviation 


Description 




- Pat. Appln. SEQ ID NO 


Patent Application SEQ ID NO: 


5 


- Ceres SEQ ID NO: 1673877 


Ceres SEQ ID NO: 




- SEQ # w. TSS 


Location within the cDN A sequence, SEQ ID NO:, ot I ranscripxton caiari oue& 
whirh eiXB listed below 




-Clone ID #:#-># 


Clone ID comprises bases # to # of the cDNA Sequence 


10 


PolyP SEQ 


Polypeptide Sequence 




- Pat. Appln. SEQ ID NO: 


Patent Application SEQ ID NO: 




- Ceres SEQ ID NO 


Ceres SEQ ID NO: 


15 


- Loc. SEQ ID NO: @ nt. 


Location of translational start site in cDNA ot SEQ ID NO: at nucleotide 
nunnber 




(C) Pred. PP Norn. & Annot. 


Nomination and Annotation ot Domains within Predicted poiypepiiae^s) 




- (Title) 


Name of Domain 




- Loc. SEQ ID NO #:#-># aa. 


Location of the domain within the polypeptide of SEQ ID NO. from # to 




(Dp) Rel. AA SEQ 


Roiatfid Amino Acid Seouences 




- Align. NO 


Alinnmpnt niimber 


25 


- gi No 


m imHor 




- Desp. 


Description 




- % Idnt. 


Percent identity 


30 


- Align. Len. 


Alignment Length 




-Loc. SEQIDNO:#->#aa 


Location within SEQ ID NO: from # to # amino acid residue. 




DETAILED DESCRIPTION OF THE 1 


NVENT10N 



35 



[0038] The invention relates to (I) polynucleotides and methods of use thereof, 

IA. Probes, Primers and Substrates; 

IB. Methods of Detection and Isolation; 

40 

B.I. Hybridization; 

B.2. Methods of Mapping; 

B.3. Southern Blotting; 

B. 4. Isolating cDNA from Related Organisms; 

45 B.5. Isolating and/or Identifying Orthologous Genes 

IC. Methods of Inhibiting Gene Expression 

C. I. Antisense 

so C.2, Ribozyme Constructs; 

C.3. Chimeraplasts; 
C.4 Co-Suppression; 
C.5. Transcriptional Silencing 
C.6. Other Methods to Inhibit Gene Expression 

55 

ID. Methods of Functional Analysis; 

IE. Promoter Sequences and Their Use; 
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IF. UTRs and/or Intron Sequences and Their Use; and 

IG. Coding Sequences and Their Use. 

[0039] The invention also relates to (II) polypeptides and proteins and methods of use thereof, 
II A. Native Polypeptides and Proteins 



A.1 Antibodies 

A. 2 In Vitro Applications 

IIB. Polypeptide Variants, Fragments and Fusions 

B. 1 Variants 
B.2 Fragments 
B.3 Fusbns 

[00401 The invention also includes (III) methods of modulating polypeptide production, such as 



IIIA. Suppression 

A.1 Antisense 
A.2 Ribozymes 
A. 3 Co-suppression 

A.4 Insertion of Sequences into the Gene to be Modulated 
A.5 Promoter Modulation 

A.6 Expression of Genes containing Dominant-Negative Mutations 



IIIB. Enhanced Expression 

B.1 Insertion of an Exogenous Gene 
B.2 Promoter Modulation 

[0041] The invention further concerns (IV) gene constructs and vector construction, such 

IVA. Coding Sequences 
IVB. Promoters 
IVC. Signal Peptides 



[0042] The invention still further relates to 
V Transformation Techniques 



Definitions 



[0043] Allelic variant An allelic variant" is an alternative fomn of the same SDF. which resides at the same chro- 
mosomal locus in the organism. Allelic variations can occur in any portion of the gene sequence, including regulatory 
regions Allelic variants can arise by normal genetic variation in a population. Allelic variants can also be produced by 
genetic engineering methods. An allelic variant can be one that is found in a naturally occurring plant, including a 
cultivar or ecotype. An allelic variant may or may not give rise to a phenotypic change, and may or may not be expressed 
An allele can result in a detectable change in the phenotype of the trait represented by the locus. A phenotypically 
silent allele can give rise to a product. 

[0044] Alternatively spliced messages Within the context of the current invention, alternatively spliced messag- 
es" refers to mature mRNAs originating from a single gene with variations in the number and/or identity of exons. 
introns and^or intron-exon junctions. a. ■ ♦ 

[0045] Chimeric The term chimeric' is used to describe genes, as defined supra, or contructs wherein at least 
two of the elements of the gene or construct, such as the promoter and the coding sequence and/or other regulatory 
sequences and/or filler sequences and/or complements thereof, are heterologous to each other. 
[0046] Constitutive Promoter Promoters referred to herein as "constitutive promoters" actively promote transcnption 
under most but not necessarily all. environmental conditions and states of development or cell differentiation. Examples 
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10 



15 



20 



25 



.n uvA '^S transcript initiation region and the V or 2' 
genes that are expressed atthe sarr^e characterize protein farr^ilies and/or 

proteins or motifs. Typically. ^^^^^^ JT^'^J^^g the entirety of the sequence of a P^°^«'"- °f ^^^^^^^ i„,ention are de- 

100501 Exogenous ^r i^^^^^^^ 

whether chinrteric or not. that '"'^J'V ° other than by a '^^^ .^^.gforrm^^^^^ (ot dicots - e.g. 

organism .nclude ^5-^— f^^^^^^^ 

this can be accomplshed are descrra £g^^g„a gt al. EMBO J ^^;,i.^^^,L,hnoto9y 14:745 (1996). May 

Salomon et al. EMBO J. f ^^^f^^JJ'j to:355 (1996), Ishida et al,, electroporation. 
papers are those by Escudero t aL, ^ J Cur.."' ^ne^ ^ere as a To tor 

the primary transgenic P'^"); ^in"^^^^^^^ L a non-naturally found •ocatj^ . .,3 ...verted into 

encompass inserting -^'^^^'^^^^^Zrein. filler sequence" refers to ""^^^^^^^^^^ ^ ^^^^ter and a coding region 
10051] Filler sequence^ ,^.,0!" Lcing between particular components such as p 

DNA construct to ^'''^^^^.^^'^"'^^l^^^Ju as a restriction enzyme srte. g,, regulatory and coding 

and may provide an addmonaiatt but^^^^^^^^^^ 

30 [0052] Gene: The tem. gene, as used .n ^^.^ ^.^^ ^ ^^^^t.c ""f ' .j^jted to, those that 

sequence contiguously assocBted a s^g e ^^^.^ ^^^^^.^^ ^^^^ ,„„„de, ^"ta^ n°t l.m^ ^.^.^ 

L'n include non-coding DNA^onformatton chroma^ c—^^^^^^^^^ 

specify polyadenylationtranscnptionalr^g^^^^^^^^^^^ 
bLemethylationandb.ndmgsrteso iprot^^^^^ 

which may be interrupted by '"t^°"^^<"°"^'~ require binding of proteins and/or "^^^^^^ ^ne 



35 



40 
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10 



IS 



20 



25 



30 



40 



45 



curring in the genome that separates ^'^l^'^^^l^^''^^ ^^^^^ ^ heritable change in DNA sequence at a specific 

Sn:;rtsT.ecuri=^^^^^^ 

Pr-""usGene - cu.ent in~^^^^^^^^ 

gene product that perforrris a similar function fJ^^^J'^"^* °' ^J^^^^ a polypeptide that exhibrts a degree 

'degree of sequence similarity to the first gene. ^^^^ similarity can be found within a 

ISJi, Percentageofsequenceidentny ^ J-^^^:^^^^^^^^^^ 

comparing two optimally aligned sequences over a ^^P^"^" ^'^^^^ deletions (e.g.. gaps or overhangs) as 
amino acid sequence in the ~-P-"^-.7,t°" nTcoZ^e a^^^^^^^ 

compared to the reference sequence (which does ^°'"P"f^f,™' ^^^^ J^^^ the identical nucleic acid 
sequences. The percentage is calculated by '^^^^""'"'"aj^f dMding the number 

base or amino acid residue occurs in both seq^r.ces ° V'^l^^^^^^^^^^^ and'^unip^ing the resuH by 100 

of matched positions by the total number of posrtrans '"Jf f^*^^^^^ comparison may be conducted by 

to yield the percentage of sequence '^^"1;^^°^ ^^^^^^^ by the homology alignment al- 

the local homology algorithm of Smrth 1/Vater^an Add APL. ^^^^^^^^^^ ^^^^^ 

gorithm Of W--^^^^^^^^^ these algorKhms (GAR 

Lipman Proc. Natl. Acad Sci. (USA)B^. ^' J„ r^nBtirs Software Package, Genetics Computer Group 
BESTFIT, BLAST, PASTA, and TFASTA .n the ^'scons " G^^^^^^^ 

(GCG), 575 science Dr.. Madison, Wl), f^l^V 'f P^^'°". Typically, the defautt values of 5.00 

GAP and BESTFIT are preferably -P ^^ff ^^/^^^^^^^^ ?uitTsequencl'kieniiv between po^nucleotide 

for gap weight andO.SOforgap weight length are used. The term SUDS M , , ^ se- 

or ^olypepSde sequences refers to P°'y"-'-»*^« °;P°'y^^^^^^^^ least 95%. even more 

quence identity, preferably at least 85% more P'^'f^^^^'^ f^^^^^^ sequence usingthe programs. 

preferabV, at least 96%, 97%, 98% or 99% ^^''T^^X^ZZT^^nZZ^ transcription in pbnt cells and can 
[0061] Plant Promoter A Plant promc^er" .s a prorno^^^ 

drive or facilitate transcription of a * rag"^-"^ °' "^^SnM Fo example, promoters derived from plant viruses, such 
instant invention. Such promoters need not be of P'^"' ^.q^a promoters, can be plant promoters, 
as the CaMV35S promoter or from ^^'^'^f"^'^'^''^^^^^^^^ (ubi-1 'promoter known to those of skill. 
A typical example of a plant promoter of Pf J^^^^f^'^^^^^^^ of sequence determinants located 

100621 Promoter: "The term -pronjoter as use^h^^^^^^^^^ 

upstream from the start of transcription of agene and which areinvoiv^^^^^ 8 necessary for 

aSd other proteins to initBte and -^^^^^L f^^^^^^^^^^ promoters frequently include a TATA 

assembly of a transcription complex „om the site of initiation of transcription. Basal 

box- element usua.^ located 1^ ^^^^ CGAAT) and/or a GGGCG sequence 

iSiTp-licsequence: -e te. public seque^ce^^^^^^^ 

any s^uence that has been deposited in a P"'''''^'^^'^"!^'^"^^ f ^^^^^^ on the BLAST databases on the 
and nucleotide sequences. Such ««^"^""«^^f^Pf^^ f^^'^^^^ 

NCBI FrPwebsfte(accessibleat ncbi.nlm.gov/blast). The^teba^^^^^^^^^^ non-redundant database for 

ru^n^;e=;rs^^^ 
r^; ^gS Jy sequence 

nucleotide sequence that influences transcnption °^ '^^ ^^-^^^^^ ^^t limited to. promoters, promoter control 

sequence, introns. certain sequences within a coding sequence, etc. 



50 



55 
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10065] Rented Sequences: Related sequences" refer to -th- a PoVpept^^^^^^^^ a n^c^«de sequence that 

transcriptbn specifically In tapetum and on^r during anther development (Kojonow ^t^^^ P^"' ao fy 237 S 
S,.,S^rS»,.o,i.Len,concen«^n.an<,,«np„«»»^^^^^ 

?;°ctrr-29x^^^^^^^ 

of hybridization conditions to T„ (in -C) is expressed in the mathematical equation 



T = 81 .5 -1 6.6(log.o[Na*]) + 0.41 (%G4C) - (600/N) 



(1) 



m 



than 500 nucleotides, and for conditions that include an organic solvent (formamide). 

T = 81 5+16 6 log {[Na*y(1+0.7[Na1))+ 0.41 (%G+C)-500/L 0.63(%fomiamide) (2) 

S^ilV tree of A composHion containing A Is substant^l^ free of B when at least 85% by weight 
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of the total A+B in the compositfon is A. Preferably. A comprises at least about 90% by weight of the total of A+B in 
the composition, more preferably at least about 95% or even 99% by weight. For example, a plant gene or a DNA 
sequence can be considered substantially free of other plant genes or DNA sequences. 

[0074] Translational start site In the context of the current invention, a translational start site" is usually an ATG 
in the cDNA transcript, more usually the first ATG. A single cDNA. however, nray have multiple translational start sites. 
[0075] Transcription start site Transcription start site' is used in the current invention to describe the point at 
which transcription is initiated. This point is typically located about 25 nucleotides downstream from a TFIID binding 
site such as a TATA box. Transcription can initiate at one or more sites within the gene, and a single gene may have 
multiple transcriptional start sites, some of which may be specific for transcription in a particular celHype or tissue. 
[0076] Untranslated region (UTR) A UTR" is any contiguous series of nucleotide bases that is transcnbed. but 
is not translated. Ihese untranslated regions may be associated with particular functions such as increasing mRNA 
message stability Examples of UTRs include, but are not limited to polyadenylatbn signals, terminations sequences, 
sequences located between the transcriptional start site and the first exon (5' UTR) and sequences located between 
the last exon and the end of the mRNA (3' UTR). 

[0077] Variant- The term variant' is used herein to denote a polypeptide or protein or polynucleotide molecule 
that differs from others of its kind in some way For example, polypeptide and protein variants can consist of changes 
in amino acid sequence and/or charge and/or post-translational modifications (such as glycosylation, etc). 

DETAILED DESCRIPTION OF THE INVENTION 

I. Polynucleotides 

[0078] Exemplified SDFs of the invention represent fragments of the genome of com. wheat, rice, soybean or Aia- 
b/rfops/s and/or represent mRNA expressed from that genome. The isolated nucleic acid of the Invention also encom- 
passes corresponding fragments of the genome and/or cDNA complement of other organisms as descnbed in detail 

below. • • 

[0079] Polynucleotides of the invention can be isolated from polynucleotide libraries using pnmers compnsing se- 
quence similar to those described by the REF and SEQ Tables. See. for example, the methods described in Sambrook 

et al., supra. ■ o u »i. • 

[0080] Altematively, the polynucleotides of the invention can be produced by chemical synthesis. Such synthesis 

methods are described below. 

[0081] It is contemplated that the nucleotide sequences presented herein may contain some small percentage ot 
errors These errors may arise in the normal course of detemiination of nucleotide sequences. Sequence errors can 
be corrected by obtaining seeds deposited under the accession numbers cited above, propagating them, isolating 
genomic DNA or appropriate mRNA from the resulting plants or seeds thereof, amplifying the relevant fragment of the 
genomic DNA or mRNA using primers having a sequence that flanks the erroneous sequence, and sequencing the 
amplification product. 

l A. Probes. Primers and Substrates 

[0082] SDFs of the invention can be applied to substrates for use in array applications such as, but not limited to, 
assays of global gene expression, for example under varying conditions of development, growth conditfons. The arrays 
can also be used in diagnostic or forensic methods (WO95/35505, US 5,445,943 and US 5.410,270). 
[0083] Probes and primers of the instant invention will hybridize to a polynucleotide comprising a sequence in REF 
and SEQ TABLES 1 AND 2. Though many different nucleotide sequences can encode an amino acid sequence, the 
sequences of REF and SEQ TABLES 1 AND 2 are generally preferred for encoding polypeptides ot the invention. 
However the sequence of the probes and/or primers of the instant invention need not be identical to those in REF and 
SEQ TABLES 1 AND 2 or the complements thereof. For example, some variation in probe or primer sequence andtox 
length can allow additional family members to be detected, as well as orthologous genes and more taxonomically 
distant related sequences. Similariy, probes and/or primers of the inventfon can include additional nucleotides that 
sen/e as a label for detecting the formed duplex or for subsequent cloning purposes. 

[0084] PrcAe length will vaiy depending on the application. For use as primers, probes are 1 2-40 nucleotides, pref- 
erably 18-30 nucleotides long. For use in mapping, probes are preferably 50 to 500 nucleotides, preferably 100-250 
nucleotides long. For Southern hybridizations, probes as long as several kilobases can be used as explained below 
[0085] The probes and/or primers can be produced by synthetic procedures such as the triester method of Matteucci 
et al. J. Am. Chem. Soc. 103:3185( 1981); or according to Urdea et al. Proc. Natl. Acad. 80:7461 (1981) or using 
commercially available automated oligonucleotide synthesizers. 
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I R Methods of Detection and Isolation 

[00861 The polynucleotides of the Invention can be utilized in a number ot methods known to those skilled in the art 
L probes andtor primers to isolate and detect polynucleotides, including, without limi1at.on: Sou^hems Northerns. 
Branched DNA hybridization assays, polymerase chain reaction, and microarray assays, and varetions thereof. Spe- 
cific methods given by way of examples, and discussed below include: 

Hybridization 
Methods of H/lapping 
Southern Blotting 

Isolating cDNA from Related Organisms 
Isolating and/or Identifying Orthologous Genes. 

Also the nucleic acid molecules of the Invention can used in other methods, such as high density oligc^ucleotide 
^ybHdTzfng assays, described, for example, in U.S. Pat. Nos. 6,004,753; 5.945,306; 5.945,287; 5.^^308. 5.919,686; 
5 919 ^1 5 919 627- 5,874,248; 5.871.973; 5,871,971; and 5.871,930; and PCT Pub. Nos. WO 9946^ WO 
9933981; Wo 99^0; WO 9931252; WO 9915658; WO 9906572; WO 9858052; WO 9958672; and WO 9810858. 

B 1. Hvbridization 

r00871 The isolated SDFs of REF and SEQ TABLES 1 AND 2 of the present invention can be used as probes and/ 
or prirners for detection and/or isolation of related polynucleotide sequences through hybridizafon. Hybr.d,zat.on o^ 
onenucleic acid toanotherconstitutesaphysicalproperty that defines thesubjectSDF of the .nvent.onandthe.dentmed 

related sequences. Also, such hybridization imposes structural limitations on the pair. A good general discussion of 
the factors for detemiining hybridization conditions is provided by Sambrook et al. Cl^olecular Clon.ng, a Uborato-v 
Manual, 2nd ed., c. 1989 by Cold Spring Harbor Laboratory Press Cold Spring Harbor. NY; see esp ohapX^sJ and 
12) Additional considerations and details of the physical chemistry of hybridization are provided by G.H. Keller and 
M M. Manak DNA Probes". 2''d Ed. pp. 1 -25. c. 1 993 by Stockton Press, New York, NY. 

[00881 Depending on the stringency of the conditions under which these probes and/or pnmers are used, polynucle- 
SexhibSing a Wide range of similarity to those in REF and SEQ TABLES 1 AND 2 can detected °r^^^^^^^^ 
When the practitioner wishes to examine the result of membrane hybridizations under a variety ^^ fj^^^t^ 
efficient way to do so is to perform the hybridization under a low stringency condition, then to wash the hybridization 
membrane under increasingly stringent conditions. 

[00891 When using SDFs to identrfy orthologous genes in other species, the practitioner will preferably adjust the 
amount of target DNA of each species so that, as nearly as is practical, the same number of genome equrvalents are 
present for each species examined. This prevents faint signals from species having large genomes, and thus small 
numbers of genome equivalents permassof DNA. from erroneously being interpreted as absence of the corresponding 

"iTie probes and/or primers of the instant invention can also be used to detect or Isolate nucleotides that are 
dentical-to the prabesor primers. Twonucleicacid sequences or polypeptidesaresaidtobe -identical If the sequence 

of nucleotides or amino acid residues, respecth/ely in the two sequences is the same when aligned for maximum 

correspondence as described betow. 

[00911 Isolated polynucleotides within the scope of the invention also include allelic variants of the specific sequences 
presented in REF and SEQ TABLES 1 AND 2. The probes and/or primers of the invention can a^so be u^d to detect 
and/or isolate polynucleotides exhibiting at least 80% sequence identity with the sequences of REF and SEQ TABLES 

roOM^ ^ With^esp^t trnudeotide sequences, degeneracy of the genetic code provides the possibility to substitute 
at least one base of the base sequence of a gene wrth a different base without causing the amino acid sequence of 
the polypeptide produced from the gene to be changed. Hence, the DNA of the present invention may also hav^ any 
Le sequence L has been changed from a sequence in REF and SEQ TABLES 1 AND 2 by f 
ance with degeneracy of genetic code. References describing codon usage include: Carels etal.. J. Mot. Evol.46. 45 
(1998) and Fennoy etal.. Nud. Acids fles. 21(23) : 5294 (1993). 

B.2. Mapping 

[00931 The isolated SDF DNA of the invention can be used to create various types of genetic and physical maps of 
the genome of com. Arabidopsls. soybean, rice, wheat, or other plants. Some SDFs may be absolutely associated 
mu ^Zu^!p^emW«= trate, albwing construction of g««s genetic maps. While not all SDFs will nrimediately be 
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associated wtth a phenotype. all SDFs can be used as probes for identifying polymorphisms associated with phenotypes 
SrlfSoneme^odof mapping lnvoK,es total DNA isolation from indMdual. 

one or more restriction enzymes, separated according to mass, transferred to a sol>d support hybnd^ed wrth SDF 
DNA and the pattern of fragments compared. Polymorphisms associated with a parfcular SDF are v.suahzed as drt- 

s ferences in the size of fragments produced between individual DNA samples after digestKjn wrth a particular restriction 
enzyme and hybridization with the SDF After identification of polymorphic SDF sequences. Iinl^ge studies can be 
conducted. By usingthe individuals showing polymorphisms as parents in crossing programs. F2 progeny recombinants 
or recombinant inbreds, for example, are then analyzed. The order of DNA polymorphisms along the ^chromosomes 
can be detemiined based on the frequency with which they are inherited together versus independently The closer 

10 two polymorphisms are together in a chromosome the higher the probability that they are inherited together Integration 
of the relative positions of all the polymorphisms and associated marker SDFs can produce a genetic map of the 
species where the distances between markers reflect the recombinatbn frequencies in that chromosome segment. 
r00941 ' The use of recombinant inbred lines for such genetic mapping is described for Arabidopsis by Alonsc^Btenco 

IS eds c 1 998 by Humana Press. Totowa. NJ) and for com by Burr ( tWIapping Genes with Recombinant Inbreds pp. 
249-254. In Freeling. M. and V. Walbot (Ed.). The Maize Handbook, c. 1994 by Springer-Vertag New York. Inc.: New 
York NY USA; Berlin Germany; Burr et al. Genetics (1998) 118: 519; Gardiner. J. et al.. (1993) GenetKS 134. 917) 
This procedure, however, is not limited to plants and can be used for other organisms (such as yeast) or for indwidual 

20 [SfflS] The SDFs of the present invention can also be used for simple sequence repeat (SSR) mapping. Rj" SSR 
mappng is described by Morgante et al. (77.9 Plant Journa/(1993) 3: 165). Panaud et al. (Genome (1995) 38. 1170) 
Senioret al (Crop Science (1 996) 36: 1676), Taramino et al. (Genome (1996) 39: 277) and Ahn et al. iMo -oularand 
General Genetics (1 993) 241 : 483-90). SSR mapping can be achieved using various methods. In one instance poly- 
morphisms are identified when sequence specific probes contained wrthin an SDF flanking an SSR are made and used 

25 in polymerase chain reaction (PGR) assays with template DNA from two or more indivkJuals o interest^ Here, a change 
in me number of tandem repeats between the SSR-flanking sequences produces differen ly ^iz^d fragment (U.S. 
Patent 5.766.847). Alternatively, polymorphisms can be identified by using the PGR fragment produced from the SSR- 
flanking sequence specific primer reaction as a probe against Southern blots representing different indwiduals (U.H. 
Ref seth et al. . ( 1 997) Electrophoresis 18:1519). u w » 

30 [00961 Genetic and physical maps of crop species have many uses. For example, these maps can be used to devise 
positional Cloning strategies for isolating novel genes from the mapped crop species. In addition, because the ge„omes 
of closely related species are largely syntenic (that is. they display the same ordenng of geries with n he genome), 
these maps can be used to isolate novel alleles from relatives of crop species by posftional cloning strategies, 
rooon The various types of maps discussed above can be used with the SDFs of the invention to identify Quantitative 

35 Trait Loci (QTLs). Many important crop traits, such as the solids content of tomatoes, are quantitatwe trails and result 
from the combined interactions of several genes. These genes reside at different loci in the genome, oftentimes on 
different chromosomes, and generally exhibft multiple alleles at each locus^ The SDFs of the jventjon can be used to 
identifyQTl^andisolatespecificallelesasdescribedbydeVicenteandTanksley(Ger7e<,csm585( 993) . 

to isoteting QTL alleles in present crop species, the SDFs of the invention can also be used o isolate a'leles f rom he 
40 correspondingQTLofwildrelatives.TransgenicplantshavingvariouscombinationsofQTLallelescanthenbecreated 

and the effects of the combinations measured. Once a desired allele combination has been identified, crop improvement 
can be accomplished either through biotechnological means or by directed conventional breeding programs (for review 
see Tanksley and McGouch, Sc/ence 277:1063 (1997)). 

[0098] In another embodiment, the SDFs can be used to help create physical maps of the genome of corn Arab,- 
4S dopsL and related species. Where SDFs have been ordered on a genetic map. as described above, they can be used 
as probes to discover which clones in large libraries of plant DNA fragments in YACs. BACs. etc. contain the same 
SDF or similar sequences, thereby facilrtating the assignment of the large DNA fragments to chromosomal positions. 
Subsequently the large BACs. YACs. ete. can be ordered unambiguously by more detailed studies of their sequence 
composition (e g. Marra et al. (1997) Genomic Research 7:1072-1084) and by using their end or other sequences to 
50 find L identical sequences in other ctoned DNA fragments. The overlapping of DNA sequences in this way allows 
larqe contigs of plant sequences to be built that, when sufficiently extended. provWe a complete physical map of a 
chromosome Sometimes the SDFs themselves will provide the means of joining cloned sequences into a contig^ 
[00991 The patent publicatton WO95/35505 and U.S. Patents 5.445.943 and 5.410,270 describe scanning multiple 
alleles of a plurality of loci using hybridization to arrays of oligonucleotides. These techniques are useful for each of 

55 the tvoes of mapping discussed above. 

rolM] Following the procedures described above and using a plurality of the SDFs of the present invention, any 
ndividual can be genotyped. These individual genotypes can be used for the identification of P^f "f '•^"'^^J^- 
rieties lines ecotypes and genetically modified plants or can serve as tools for subsequent genetic studies involving 
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multiple phenotypic traits. 

R a Southern Blot Hybridization 

[0101] -me sequences from REF and SEQ TABLES 1 AND 2 can be used as probes for various hybridization tech- 
niques These techniques are useful for detecting target polynucleotides in a sample or for determining whether traris- 
genic plants, seeds or host cells harbor a gene or sequence of interest and thus might be expected to exhibit a particular 

trait or phenotype. 

[01 021 In addition the SDFs from the invention can be used to isolate additional members of gene families from the 
same or different species and/or orthologous genes from the same or different species. This is accomplished by hy- 
bridizing an SDF to. for example, a Southern blot containing the appropriate genomic DNA or cDNA. Given the resulting 
hybridization data, one of ordinary skill in the art could distinguish and isolate the correct DNA fragments by size, 
restriction sites, sequence and stated hybridization conditions from a gel or from a library 

[01031 Identification and isolation of orthologous genes from closely related species and alleles within a species is 
particularly desirable because of their potential for crop improvement. Many important crop traits, such as the solid 
content of tomatoes, result from the combined interactions ot the products of several genes residing at different loci in 
the genome Generally, alleles at each of these loci can make quantitative differences to the trait. By identifying and 
isolating numerous alleles for each kxus from within or different species, transgenic plants with various combinations 
of alleles can be created and the effects of the combinatbns measured. Once a more favorable allele combination has 
been identified, crop improvement can be accomplished either through biotechnologbal means or by directed conven- 
tional breeding programs (Tanksley etal. Sc/ence 277:1063(1997)). 

[01 04] The results from hybridizations of the SDFs of the invention to. for example, Southern blots containing DNA 
from another species can also be used to generate restriction fragment maps for the corresponding genomic regions. 
These maps provide additional information about the relative positions of restriction sites within fragments, further 
distinguishing mapped DNA from the remainder of the genome. 

[01 05] Physical maps can be made by digesting genomic DNA with different combinations of restriction enzymes. 
[0106] Probes for Southern blotting to distinguish individual restriction fragments can range in size from 15 to 20 
nucleotides to several thousand nucleotides. More preferably, the probe is 100 to 1 ,000 nucleotides long tor identifying 
members of a gene family when it is found that repetitive sequences would complicate the hybridization. For identifyir^ 
an entire corresponding gene in another species, the probe is more preferably the length of the gene, typically 2,000 
to 10 000 nucleotides, but probes 50-1.000 nucleotides tang might be used. Some genes, however, might require 
probes up to 1 500 nucleotides long or overlapping probes constituting the full-length sequence to span their lengths. 
[0107] Also while it is preferred that the probe be homogeneous with respect to its sequence, it is not necessary 
For example 'as described below, a probe representing members of a gene family having diverse sequences can be 
generated using PGR to amplify genomic DNA or RNA templates using primers derived from SDFs that include se- 
quences that define the gene family. 

[01 08] For identifying corresponding genes in another species, the next most preferable probe is a cDNA spanning 
the entire coding sequence, which allows all of the mRNA-coding fragment of the gene to be identified. Probes for 
Southern blotting can easily be generated from SDFs by making primers having the sequence at the ends of the SDF 
and using com or Arabidopsis genomic DNA as a template. In instances where the SDF includes sequence conserved 
among species, primers including the conserved sequence can be used for PGR with genomic DNA from a species of 
interest to obtain a probe. Similarly, if the SDF includes a domain of interest, that fragment of the SDF can be used to 
make primers and, with appropriate template DNA. used to make a probe to identify genes containing the domain. 
Alternatively, the PGR products can be resolved, for example by gel electrophoresis, and cloned and/or sequenced. 
Using Southem hybridization, the variants of the domain among members of a gene family, both within and across 
species, can be examined. 

B.4.1 Isolating DNA from Related Organisms 

[0109] The SDFs of the invention can be used to isolate the corresponding DNA from other organisms. Either cDNA 
or genomic DNA can be isolated. For isolating genomic DNA, a lambda, cosmid. BAG or YAG. or other large insert 
genomic library from the plant of interest can be constructed using standard molecular biology techniques as described 
in detail by Sambrook et al. 1989 (Molecular Cloning: A Uboratory Manual. 2«' ed. Cold Spring Harbor Laboratory 
Press New York) and by Ausubel et al. 1992 (Current Protocols in Molecular Biology. Greene Publishing. New York). 
[Olioi To screen a phage library, for example, recombinant lambda clones are plated out on appropriate bacterial 
medium using an appropriate E. co// host strain. The resulting plaques are lifted from the plates using nylon or nitro- 
cellulose filters The plaque lifts are processed through denaturation. neutralization, and washing treatments following 
the standard protocols outlined by Ausubel et al. (1992). The plaque lifts are hybridized to either radioactively labeled 
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or non-radioactively labeled SDF DNA at room temperature for about 16 hours, usually in the presence ot 50% forma- 
mide and 5X SSC (sodium chloride and sodium citrate) buffer and blocking reagents. The plaque lifts are then washed 
at 42»C with 1% Sodium Dodecyl Sulfate (SDS) and at a particular concentration of SSC. The SSC concentration used 
is dependent upon the stringency at which hybridization occurred in the initial Southern blot analysis perfomied. For 
example, if a fragment hybridized under medium stringency (e.g., Tm - 20-'C). then this condition is maintained or 
preferably adjusted to a less stringent condition (e.g.. Tm-30=C) to wash the plaque lifts. Positive clones show detec - 
able hybridization e g . by exposure to X-ray films or chromogen formation. The positive clones are then subsequently 
isolated for purification using the same general protocol outlined above. Once the clone is purified, restriction analysis 
can be conducted to narrow the region corresponding to the gene of interest. The restriction analysis and succeeding 
subcloning steps can be done using procedures described by, for example Sambrook et al. (1989) cited above. 
[0111] The procedures outlined for the lambda library are essentially similar to those used for YAC library screening, 
except that the YAC clones are harbored in bacterial colonies. The YAC clones are plated out at reasonable density 
on nitrocellulose or nylon filters supported by appropriate bacterial medium in petri plates. Following the growth of the 
bacterial clones the filters are processed through the denaturation, neutralization, and washing steps following the 
procedures of Ausubel et al. 1992. The same hybridizatton procedures for lambda library screening are followed. 
[0112] To isolate cDNA. similar procedures using appropriately modified vectors are employed. For instance, the 
library can be constructed in a lambda vector appropriate for cloning cDNA such as X«t11. Alternatively, the cDNA 
library can be made in a plasmid vector. cDNA for cloning can be prepared by any of the methods known in the art 
but is preferably prepared as described above. Preferably, a cDNA library will include a high proportion of full-length 
clones. 

B. 5. Isolating and/or Identifvinq Ortholooous Genes 

[0113] Probes and primers of the invention can be used to identify and/or isolate polynucleotides related to those in 
REF and SEQ TABLES 1 AND 2 Related polynucleotides are those that are native to other plant organisms and exhibit 
either similar sequence or encode polypeptides with similar biological activity One specific example is an orthologous 
gene Orthologous genes have the same functional activity. As such, orthologous genes may be distinguished from 
homologous genes. The percentage of identity is a function of evolutionary separation and, in closely related species, 
the percentage of identity can be 98 to 100%. The amino acid sequence of a protein encoded by an orthologous gene 
can be less than 75% identical, but tends to be at Ieast75% or at least 80% identical, more preferably at least 90 /». 
most preferably at least 95% identical to the amino acid sequence of the reference protein. To find orthologous genes, 
the probes are hybridized to nucleic acids from a species of interest under low stringency conditions, preferably one 
where sequences containing as much as 40-45% mismatches will be able to hybridize. This condition is established 
by T - 40'C to T - 48''C (see below) Blots are then washed under conditions of increasing stringency It is preferable 
that thewash stringency besuchthat sequences thatare85to100% identical willhybridize.l\/lorepreferably.sequerices 

90 to 100% identical will hybridize and most preferably only sequences greater than 95% identical will hybndize. One 
of ordinary skill in the art will recognize that, due to degeneracy in the genetic code, amino acid sequences that are 
identical can be encoded by DNA sequences as little as 67% identical or less. Thus, it is preferable, for example, to 
make an overlapping series of shorter probes, on the order of 24 to 45 nucleotides, and individually hybridize them to 
the same arrayed library to avoid the problem of degeneracy introducing large numbers of mismatches. 
[01 14] As evolutionaiy divergence increases, genome sequences also tend to diverge. Thus, one of skill will recog- 
nize that searches for orthologous genes between more divergent species will require the use of lower stringency 
conditfons compared to searches between closely related species. Also, degeneracy of the genetic code is nrare of a 
problem for searches in the genome of a species more distant evoluttonarily from the species that is the source of the 
SDF probe sequences. 

[0115] The SDFs of the invention can also be used as probes to search for genes that are related to the SDF within 
a species Such related genes are typically considered to be members of a gene family In such a case, the sequence 
similarity will often be concentrated into one or a few fragments of the sequence. The fragments of similar sequence 
that define the gene family typically encode a fragment of a protein or RNA that has an enzymatic or structural function. 
The percentage of identity in the amino acid sequence of the domain that defines the gene family is preferably at least 
70% more preferably 80 to 95%. most preferably 85 to 99%. To search for members of a gene family within a species, 
a low stringency hibridizatfon is usually perfomr^ed. but this will depend upon the size, distribution and degree of se- 
quence divergence of domains that define the gene family. SDFs encompassing regulatory regions can be used to 
identify coordinately expressed genes by using the regulatory region sequence of the SDF as a probe. 
[0116] In the instances where the SDFs are identified as being expressed from genes that confer a particular phe- 
notype then the SDFs can also be used as probes to assay plants of different species for those phenotypes. 
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LC. Methods to Inhibit Gene Expression 

[0117] The nucleic acid molecules of the present invention can be used to inhibit gene transcription and/or translation. 
Example of such methods include, without limitation: 

5 

Antisense Constmcts; 
Ribozyme Constructs; 
Chimeraptast Constructs; 
Co-Suppression; 
10 Transcriptional Silencing; and 

Other Methods of Gene Expression. 

C.I Antisense 

15 [01 18] In some instances it is desirable to suppress expression of an endogenous or exogenous gene. A well-known 
instance is the FLAVOR-SAVOR' tomato, in which the gene encoding ACC synthase is Inactivated by an antisense 
approach, thus delaying softening of the trurt after ripening. See for example, U.S. Patent No. 5,859,330; U.S. Patent 
No. 5,723,766; Oeller, etal, Science, 254:437-439(1991); and Hamilton etal. Nature, 346:284-287 (1990). Also, timing 
of flowering can be controlled by suppression of the FLOWERING LOCUS 0 (FLC); high levels of this transcript are 

20 associated with late flowering, while absence of PLC is associated with early flowering (S.D. Michaels et al., Plant Cell 
VV,949 (1 999). Also, the transition of apical merlstem from production of leaves with associated shoots to flowering is 
regulated by TERMINAL FLOWERI, APETALA 1 and LEAFY. Thus, when it is desired to induce a transition from shoot 
production to flowering, it Is desirable to suppress TFL1 expression (S.J. Liljegren, Plant Cell tVAOO? (1999)). As 
another instance, arrested ovule development and female sterility result from suppression of the ethylene forming 

25 enzyme but can be reversed by application of ethylene (D. De Martinis et al., Plant Cell ^^:^06^ (1999)). The ability 
to manipulate female fertility of plants Is useful in increasing fruit production and creating hybrids. 
[0119] In the case of polynucleotides used to Inhibit expression of an endogenous gene, the introduced sequence 
need not be perfectly Identical to a sequence of the target endogenous gene. The introduced polynucleotide sequence 
will typically be at least substantially Identical to the target endogenous sequence. 

30 [0120] Some polynucleotide SDFs in REF and SEQ TABLES 1 AND 2 represent sequences that are expressed in 
corn, wheat, rice, soybean Arabidopsis and/or other plants. Thus the invention Includes using these sequences to gen- 
erate antisense constructs to inhibit translation and/or degradation of transcripts of said SDFs. typically in a plant cell. 
[01 21 ] To accomplish this, a polynucleotide segment from the desired gene that can hybridize to the mRNA expressed 
from the desired gene (the antisense segment") is operably linked to a promoter such that the antisense strand of RNA 

3S will be transcribed when the construct is present In a host cell. A regulated promoter can be used in the construct to 
control transcription of the antisense segment so that transcription occurs only under desired circumstances. 
[0122] The antisense segment to be Introduced generally will be substantially Identical to at least a fragment of the 
endogenous gene or genes to be repressed. The sequence, however, need not be perfectly Identical to Inhibit expres- 
sion. Further, the antisense product may hybridize to the untranslated region Instead of or In addition to the coding 

40 sequence of the gene. The vectors of the present invention can be designed such that the inhibitory effect applies to 
other proteins within a family of genes exhibiting homology or substantial homology to the target gene. 
[0123] For antisense suppression, the introduced antisense segment sequence also need not be full length relative 
to either the primary transcription product or the fully processed mRNA. Generally a higher percentage of sequence 
identity can be used to compensate for the use of a shorter sequence. Furthermore, the introduced sequence need 

^ not have the same Intron or exon pattern, and homology of noncoding segments may be equally effective. Normally, 
a sequence of between about 30 or 40 nucleotides and the full length of the transcript can be used, though a sequence 
of at least about 100 nucleotides is preferred, a sequence of at least about 200 nucleotides is more preferred, and a 
sequence of at least about 500 nucleotides is especially preferred. 

50 C.2. Ribozvmes 

[0124] It is also contemplated that gene constructs representing ribozymes and based on the SDFs in REF AND 
SEQ TABLES 1 AND 2 are an object of the invention. Ribozymes can also be used to Inhibit expression of genes by 
suppressing the translation of the mRNA into a polypeptide. It is possible to design ribozymes that specifically pair with 
55 virtually any target RNA and cleave the phosphodiester backbone at a specific location, thereby functionally inactivating 
the target RNA. In carrying out this cleavage, the ribozyme is not itself altered, and is thus capable of recycling and 
cleaving other molecules, making it a true enzyme. The inclusion of ribozyme sequences within antisense RN As confers 
RNAcleavIng activity upon them, thereby increasing the activity of the constructs. 
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RNAs or wth a helper v.rus (satellrte RNAs). Examples include RNAs from avocado sunbloich virold and the Se 
em^ar^^'ro 

SpoLi .olh ^!'^^"f ^ ^^\^c\s above, the ribozyme sequence fragment necessary for pairing need not be 
RlS^ml ? '° the sequences in REF AND SEQ TABLES 1 AND 2 

Ribozymes may be constructed by combining the ribozyme sequence and some fragment of the targ^ceneTich 
would altow recognrtion of the target gene mRNA by the resulting ribozyme molecule Genemlirthe St^^^^^^^ TZ 
nbozyme capable of binding to the target sequence exhibits a percentage of sequence idS withTteast Soy 
pre erabjr wrth at least 85%, more preferably wrth at least 90% and most preferabty wSh at teasrS-y! iTen mol^ 

f SrS^ Tnl 'Tk T" '^"^ °' ^^"^^ *° Of a seq nee in REF A^D SEQ 

Ji! ^'"P'^'"^"^ thereof. The ribozyme can be equally effective In inhibrting mRNA trans^tk^ bv 

to clLn T 7 "^^"^'f^^ °^ «=««"9 -9'°"«- Generally, a higher percentage of sequenc^Z^^n blTsS 
o compensate for the use of a shorter sequence. Furthermore, the introduced sequence need norh^v?the Ze 
.ntron or exon pattern, and homology of non^oding segments may be equally effectle. 

C.3. Chimeraplasts 

[01 27] The SDFs of the invention, such as those described bv the REF and <^Pn Tahi^^e ^i^^ u 
Chimeraplasts that can be introduced into a cel. to prcx^uce ItrsronTsp^'c^il^^^^^^^^ 

JnlTT^H H ' °' ^'^"^^^^P'ast is an Oligonucleotide comprising DNA anlr RNl that 

specif cally hybr>d.zes to a target region in a manner which creates a mismatched base-pair This mfem^^hed tase 
pair signals the cell's repair enzyme machinery which acts on the mismatched region ZCinT^^ceS^^^^^^^ 
meS^ni^« PK °' T^''^''^ nucleotide(s). The altered sequence is then expressed by the eel s nZaTceTl^^^^ 
Tnc^orTtn """"r'T"'", '° '^P"'^ -""'^"^ "^'^ 's. introduce site-specirmuS ons 

^23^^)07^^^^^^^^^ ^'^^^-^^ pcTrrN": 

C.4. Sense Suppression 

ixn^S,i "^K °* ^'^^^^^ ' 2 Of the present invention are also useful to modulate qene 

express.on by sense suppression. Sense suppression represents another method of gene suppressbn brnttdudnq 
fn.!« T '^OPV °^ 'figment of the endogenous sequence to be suppressed introducing 

trth! n J^r^"1T \"P^^""'°" '^^^^"^^ a ""<=leic acid is configured in the sense orientation with respect 
bv Jii'/hT H w ^""l^"}"^^^ °f « Pbnt or by a self-replicating virus has been shown to be an effecCe means 
T '^"9'^'^""°" °' "^f^NAs of target genes. For an example of the use of this methodTr^cSufate 

5 2^0^ a ^5?83T8;;^^^^^^ "'^ Patents N^. 5.oS32t 

roi 3OT ^o^ ^Pn^ « r °' '"^y '"''"'^^ transcription of the introduced sequence 

Sn.p ''^ substantially identical to the endogenous 

S bu, J h' h rT'^- °^ '^^^^^'^ ''^^"tity Will typically be gr^mer than abou 

ofo^°uc,J c;pf ''T. k'' f ' '^'"'"y ""9'* ^''^^ ^ '"^^^ ^^f^^ti^^ i" the level of norrSi gene 

products^ Sequence identity of more than about 80% is preferred, though about 95% to absolute identr^ufd J! 
most preferred As with antisense regulation, the effect would like^ apply'to any other p^teL^w hiSiter r^^ 
of genes exhibiting homology or substantial homology to the suppressing sequence. ^ 

C.5. Transcriptional Silencing 

[0131] The nucleic acid sequences of the invention, including the SDFs of REF and SEQ TABL FS 1 Awn o =.r,H 

?ND slk^TSTATnTr rt'' "^'"9 Oligonucleotides based on sequences Z ref 

12,^ IT\^V '^^ ^ tragments thereof, and substantial^ similar sequence thereto. The oligonucleotide ran 

An Oligonucleotide of interest is one that can bind to the promoter and block bind^g of a tran^S f^or tS he 
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C.6. Other Methods to Inhibit Gene Expression 

[0132] Yet another means of suppressing gene expressfon is to insert a polynucleotide into the gene of interest to 
disrupt transcription or translation of the gene. ■«» H«ne oi mieresuo 

,h« noi„.l;7 *;«''"^"'=y f'°'™>l°9°"s recombination can be used to target a polynucleotide insert to a gene by flanking 

Sl^!. '"f tlrtio". random insertbn of polynucleotides into a host cell genome can also be used to disrupt the qene 

SEQ ^Zes'i ISn? r ^^.T' '^^^'^"''^^ ^"^^ "^'^ °" sequences from REF AnI) 

Lrf„l«H K , ? 2 fragments thereof, and substantially similar sequence thereto. Th?screening can also be 
performed by selecting clones or R, plants having a desired phenotype. 

I D. Methods of Functional Analysis 

[0135] The constructs described in the methods under I.e. above can be used to detemiine the function of the 
polypeptide encoded by the gene that is targeted by the constructs aeiemiine ine lunction of the 

!.°n!?t °°'^;^^9"'^'*"9 transcription and translation of the targeted gene in the host cell or organisms such as 
a plant rnay produce phenotypic changes as compared to a wild-type cell or organism. In addition />?vS.o aJsavs cS 
be used to dmern^ine if any biological activity, such as calcium flux. DNA tran^ription, nucleotide incCoS 
are being modulated by the down-regulation of the targeted gene oi.oe incorporation, etc., 

[0137] Coordinated regulation of sets of genes, e.g.. those contributing to a desired polygenic trait is sometimes 

domains can be assembled into hybrid transcriptional activators. These hybrid transcriptional activators can be usS 
with their corresponding DNA elements (i.e.. those bound by the DNA-binding SDFs) to effect coordina L^pres^^ 
Of desired genes (J.J. Schwarz et al.. Mol. Cell. Biol. 12:266 (1992). A. Martinez et al.. Mo,. Gen. gITSS 

[0138] The SDFs of the invention can also be used in the two-hybrid genetic systems to identify networks of orotein 
protein interactions (L. McAlister-Henn et al.. Methods 1 9:330 (1 999). J.C Hu et al Methods^7o^2oS^lrn^^^^^ 
eta/.... BIO, C/,e.. ^4:36428 (1999). K. Ichimuraet^. Blo^^heJhlophys. fli.'S^^if S^i)^^^^^^^^ 

(":;t™^^^ 

I.E. Promoters 

[0139] The SDFs of the invention are also useful as structural or regulatory sequences in a construct for modulatina 
the expression of the corresponding gene in a ptent or other organism, e.g. a symbiotic bacterium S e^^e 1 
moter sequences associated to SDFs of REF and SEQ TABLES 1 AND 2 of the present invention ca^TusefulTn 
d.ec.ing expression Of coding sequences eitherasconstitutivepromotersortodirecU 

tissues, or organs or in response to environmental stimuli H<i>iicuiarceii lypes, 

[0140] With respect to the SDFs of the present invention a promoter is likely to be a relativeV small portion of a 
genomic DNA (gDNA) sequence located In the first 2000 nucleotides upstream from an inH«l SdSeStntgDNA 
sequence or initial ATG- or methionine codon or translational start srte in a corresponding ^DNA s q^^^^^^^ 

t'r^ Js'taTSfo^a cDN^^^^^^ ""^ ""'^'^"^''^^ °' - - methionine ciion o 

ranslational start site of a cDNA sequence corresponding to a gDNA sequence. In particular, the promoter is usuallv 

located upstreamo the transcription start site. The fragments ofapartbular gDNA sequence th^^^^^^^ 

tl B^mDSEXS!lTl r '° '° ^^'^^ sequences'presented and d^Zrrd n REF 

Sm^rti^ " '"'^"^^ '° '^"S*^ °' '^^ P'°be and its base 

or assembly of a tnanscnption complex compnsing an RNA polymerase, for example RNA oolvmerase II A fvoirai 

oXnTto'^iS:7"r^^^^^ 

only onetobind DNA directly. The promotermightalso contain oneor more enhancers and/or sup;reSrsthat?^^^^ 
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.£1'°' ^TT^^ transcription factors that have the f unctton of modulating the level of transcription with 
respect to tissue specrficty and of transcriptional responses to particular environmental or nutritional factors and me 

[0142] Short DNA sequences representing binding sites for proteins can be separated from each other by intervenina 

sequencesovary^glength. For example. withinaparticularfunctionalmodule.prateinbindbgsrtesZ 

by reg|ons Of 5 to 60. preferably 10 to 30. more preferably 10 to 20 nucleotides. Within suc^b nding sSsTera^ 

J^p,cally2to6nucleot,des that specffically contact aminoacids of the nucleic acid bindingprotein.Th^^ 

srtes are usually separated from each other by 10 to several hundred nucleotides, typically by 15 to nucl^SdLs 

Often by 20 to 50 nucleotides. DNA binding sites in promoter elements often display^ad s^etry°n theirsS^lnce' 

oen^or fh^TjJJ^* transcription regulatory function can be isolated from their corresponding endogenous 
gene, o the desired sequence can be synthesized, and recombined in constructs to direct expression of a codina 

When"h bXT ' '"'"J ^' '^-f--'^ °^ °ther desired manner of induc^^o" suppres^^ 

When hybndizations are perfomned to identify or isolate elements of a promoter by hybridization to the long seauenis 

promoters^ For example short probes, constituting the element sought, are preferably used under low temperatu°e 

and/orhigh salt conditions. When longprobes,whichmi9htincludeseveralproLerelementsareused^r^^^^^^ 
stringency conditions are preferred when hybridizing to promoters across species 'owtomedium 

1^^°^'"^^ ^eq^ience of an SDF. or part of the SDF. functions as a promoter or fragment of a promoter 
thennucleotidesubstitutjns. insertions or deletions that donotsu^^^^^^^^ 

proteins would be considered equivalent to the exemplified nucleotide sequence. It is envisioned th^ thePe are in 
TZZ ' " " *° '''"''"9 °' ^^'^ '''"^"S P^°teins to Silence or dollrguSe a 

andv«:eversa.lnsuch instances, polynucleotidesrepresentingchangestothenucleotld^ 

contact reg^n by insert^n of additional nucleotides, changes to identity of relevant nucleotides, including use ofchlr 
nl'nr r °l °' °' "^'^ encompassed by the ptsenHnventS 

us^i S'ihf 1 '^''"''''^ ""^^ SEQ Tables and variams theilof cTbe 

[0145] Promoter function can be assayed by methods known in the art, preferably by measuring activity of a reoorter 

ZJrfT ""'"^ '° '"'"3 P^^'""'^^ Examples Of reporte^gelslldeCe 

encoding luciferase, green fluorescent protein. GUS, neo, cat and bar. 

I F UTRs and Junctions 

[0146] Po Vnucleotides comprising untranslated (UTR) sequences and intron/exon junctions are also within the scooe 
of the invention. UTR sequences include introns and 5' or 3" untranslated regions (5- UTRs or 3' UTRsT Frments of 

[0147] These fragments of SDFs, especially UTRs. can have regulatory functions related to, for example translation 
rate and mRNA stability Thus, these fragments of SDFs can be isolated for use as elemen s ci gen^onsT^^^^^^^^^ 
regulated production of polynucleotides encoding desired polypeptides constructs for 

isItJlii!",™"" ""^'^ ^^9'"^"'= "^'9ht also have regulatory functions. Sometimes regulatory elements 
especBlly transcription enhancer or suppressor elements, are found within introns. Also, elements related to sStv 

e^^ZTT' .H^ °' '"'"='"9 °' '^""^P"^ '° '=y«°P'^«'" translation can betund inTnt on' 

elements. Thus, these segments can also find use as elements of expression vectors intended for use to transfo^^ 

UtS. nrfn,^n,lv , ^ ''""^ '^""^ sequences preferably will not affect the ragulato^^ activity of the 

in s^« „T J" °" °" ♦'^"='^^'P«°". or translation unless selected to do so Howeve' 

ac^Z °' up-regulation of such activity may be desired to modulate traits or phenotypic or S 



I.G. CcKfino Sequences 



in f 2, "°'5"' l^""'*^ "> »» l™"!!"" l™!"* coding sequsnces lhal encode polMjeptides compnsina 
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and the pnman^ transcript is subsequently processed and translated by a host cell (or a cell free Ti^^o Ll? 
[0153] In addition to coding sequences encoding the polypeptide sequences of REF AND SEQ TABLF<5 i Awn o 



as a hybridization probe. 
25 II. Polypeptides and Proteins 

1 1 A. Native polypeptides and proteins 



[01 57] Native polypeptides include the proteins encoded by the sequences shown in REF AND SEQ TABLES 1 amh 
2. Such native polypeptides include those encoded by allelic variants ° 



A.I Antibodies 
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The antitwDdies are also useful for examining the production level of proteins in various tissues, for example in a wild- 
type plant or following genetic manipulation of a plant, by methods such as Western blotting. 
[01 63] Antibodies of the present invention, both polyclonal and monoclonal, may be prepared by conventional meth- 
ods. In general, the polypeptides of the invention are first used to immunize a suitable animal, such as a mouse rat 
rabbit, or goat Rabbits and goats are preferred for the preparation of polyclonal sera due to the volume of serum 
obtainable, and the availability of labeled anti-rabbit and anti-^t antibodies as detection reagents. Immunization is 
generally performed by mixing or emulsifying the protein in saline, preferably in an adjuvant such as Freund's complete 
adjuvant, and injecting the mixture or emulsion parenterally (generally subcutaneously or intramusculariy) A dose of 
50-200 ng/injection is typically sufficient. Immunization is generally boosted 2-6 weeks later with one or more injections 
of the protein in saline, preferably using Freund's incomplete adjuvant. One may alternatively generate antibodies by 
in vitro immunization using methods known in the art, which for the purposes of this invention Is considered equivalent 
to in VIVO immunization. 

[01 64] Polyclonal antisera is obtained by bleeding the immunized animal into a glass or plastic container incubating 
the blood at 25<>C for one hour, followed by incubating the blood at 4''C for 2-18 hours. The serum is recovered by 
centrifugation (e.g., I.OOOxg for 10 minutes). About 20-50 ml per bleed may be obtained from rabbits. 
[0165] Monoclonal antibodies are prepared using the method of Kohler and Milstein. Natum 256: 495 (1975) or 
modification thereof. Typically, a mouse or rat is immunized as described above. However, rather than bleeding the 
animal to extract serum, the spleen (and optionally several large lymph nodes) is removed and dissociated into single 
cells. If desired, the spleen cells can be screened (after removal of nonspecifically adherent cells) by applying a cell 
suspension to a plate, or well, coated with the protein antigen. B-cells producing membrane-bound immunogtobulin 
specific for the antigen bind to the plate, and are not rinsed away with the rest of the suspension. Resulting B-cells or 
all dissociated spleen cells, are then induced to fuse with myetoma cells to form hybridomas. and are cultured in a 
selective medium (e.g.. hypoxanthine. aminopterin. thymidine medium. HAT"). The resulting hybridomas are plated by 
limiting dilution, and are assayed for the production of antibodies which bind specifically to the immunizing antigen 
(and which do not bind to unrelated antigens). The selected Mab-secreting hybridomas are then cultured either in vitro 
(e.g., in tissue culture bottles or hollow fiber reactors), or in v'rvo (as ascites in mice). 

[01 66] Other methods for sustaining antibody-producing B-cell clones, such as by EBV transformation, are known 
[01 67] If desired, the antibodies (whether polyclonal or monoclonal) may be labeled using conventional techniques 
Suitable labels include fluorophores, chromophores, radioactive atoms (particularly 32p and 126|), electron-dense re- 
agents, enzymes, and ligands having specific binding partners. Enzymes are typically detected by their activity. For 
example, horseradish peroxidase is usually detected by its ability to convert 3.3'.5,5'-tetramethylbenzidine (TNB) to a 
blue pigment, quantifiable with a spectrophotometer. 

A.2 In Vitro Applications of Polypeptides 

[0168] Some polypeptides of the invention will have enzymatic activities that are useful in vitro. For example the 
soybean trypsin inhibitor (Kunitz) family is one of the numerous families of proteinase inhibitors. It comprises plant 
proteins which have inhibitory activity against serine proteinases from the trypsin and subtilisin families thiol protein- 
ases and aspartte proteinases. Thus, these peptides find in vitro use in protein purification protocols and perhaps in 
therapeutic settings requiring topical application of protease inhibitors. 

[01 69] Delta-aminolevulinic acid dehydratase (EC 4.2.1.24) (ALAD) catalyzes the second step in the biosynthesis 
of heme, the condensation of two molecules of 5-aminolevulinate tofomi porphobilinogen and is also involved in chlo- 
rophyll biosynthesis(Kaczor et al. (1994) Plant Physiol. 1-4: 1411-7; Smith (1988) Biochem, J. 249- 423-8- Schneider 
(1 976) Z. naturforsch. (CJ 31 : 55-63). Thus, ALAD proteins can be used as catalysts in synthesis of heme derivatives 
Enzymes of biosynthetic pathways generally can be used as catalysts for in vtf/o synthesis of the compounds repre- 
senting products of the pathway. 

[0170] Polypeptides encoded by SDFs of the invention can be engineered to provide purification reagents to identify 
and purify additional polypeptides that bind to them. This allows one to identify proteins that function as multimers or 
elucidate signal transduction or metabolic pathways. In the case of DNA binding proteins, the polypeptide can be used 
in a similar manner to identify the DNA determinants of specific binding (S. Pierrou et al. , Anal Biochem 229 99 (1 995) 
S. Chusacultanachai et al., J. Biol. Chem. 274:23591 (1 999), Q. Lin et al.. J. Biol. Chem. 272:27274 (1997)). 

II.B . POLYPEPTIDE VARIANTS . FRAGMENTS, AND FUSIONS 

[0171] Generally, variants , fragments, or fusions of the polypeptides encoded by the maximum length sequence 
(MLS) can exhibit at least one of the activities of the identified domains and/or related polypeptides described in Sections 
(C) and (D) of REF TABLES 1 and 2 corresponding to the MLS of interest. 
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II.B.(I) Variants 

[0172] A type of variant of the native polypeptides comprises amino acid substitutions. Conservative substitutions 
described above (see II.). are preferred to maintain the function or activity of the polypeptide. Such substitutions include 
conservation of charge, polarity, hydrophobicity. size. etc. For example, one or more amino acid residues within the 
sequence can be substituted with another amino acid of similar polarity that acts as a functional equivalent, for example 
providing a hydrogen bond in an enzymatic catalysis. Substitutes for an amino acid within an exemplified sequence 
are preferably made among the members of the class to which the amino acid belongs. For example, the nonpolar 
(hydrophobic) ammo acids include alanine, leucine, isoleucine, valine, proline, phenylalanine, tryptophan and methio- 
nine. The polar neutral amino acids include glycine, serine, threonine, cysteine, tyrosine, aspaiagine. and glutamlne 
The positively charged (basic) amino acids include arginine. lysine and histidine. The negatively charged (acidic) amino 
acids include aspartic acid and glutamic acid. / 

'*'!.®.'=°P^ °* percentage of sequence identity described above, a polypeptide of the invention may 
have addrtional individual amino acids or amino acid sequences inserted into the polypeptide in the middle thereof 
and/or at the N-terminal and/or C-terminal ends thereof. Likewise, some of the amino acids or amino acid sequences 
may be deleted from the polypeptide. Amino acid substitutions may also be made in the sequences' consen/ative 
substitutions being preferred. 

10174] One preferred class of variants are those that comprise (1 ) the domain of an encoded polypeptide andtor (2\ 
residues consen/ed between the encoded polypeptide and related polypeptides. For this class of variants, the encoded 
polypeptide sequence is changed by insertion, deletion, or substitution at positions flanking the domain and/or con- 
served residues. 

[017S] Another class of variants includes those that comprise an encoded polypeptide sequence that is changed in 
tne domain or conserved residues by a conservative substitution. 

[01 7q Yet another class of variants includes those that lack one of the in vitro activities, or structural features of the 
encoded polypeptides. One example is polypeptides or proteins produced from genes comprising dominant negative 
mutations. Such a variant may comprise an encoded polypeptide sequence with non-consewatK^e changes in a par- 
ticular domain or group of conserved residues. 

II.A.(2) FRAGMENTS 

Fragments of particular interest are those that comprise a domain identified for a polypeptide encoded by an 
MLS of the instant Invention and variants thereof. Also, fragments that comprise at least one region of reskJues con- 
served between an MLS encoded polypeptide and its related po^peptides are of great interest. Fragments are some- 
times useful as polypeptides corresponding to genes comprising dominant negative mutations are. 

II.A.(3)FUSIONS 

[0178] Of interest are chimeras comprising (1) a fragment of the MLS encoded polypeptide or variants thereof of 
in eres and (2) a fragment of a polypeptide comprising the same domain. For example, an AP2 helix encoded by a 
MLS of the invention fused to second AP2 helix from ANT protein, which comprises two AP2 helices The present 
invention also encompasses fusions of MLS encoded polypeptides, variants, or fragments thereof fused with related 
proteins or fragments thereof. 

DEFINITION OF DOMAINS 

[017^ The polypeptides of the invention may possess identifying domains as shown in REF TABLES 1 and 2Specific 
domains within the MLS encoded polypeptides are indicated by the reference REF TABLES 1 and 2 In addition the 
domains within the MLS encoded polypeptide can be defined by the regbn that exhibits at least 70% sequence identity 
with the consensus sequences listed in the detailed description below of each of the domains. 
[01 80] The majority of the protein domain descriptions given betow are obtained Ucxn Prosite 
(http//www.expasy.ch/prosite/). and Pfam, 
(http//pf am. wust I. edu/browse. shtml). 

1 . (AAA) AAA-protein family signature 

V} . oo'f .'^"""^ °* ^"^^^^^ "^^^ described [1 to 5] whose key feature is that they share a consented region 
Of about 220 ammo acids that contains anATP-binding site. This family is now called AAA, for 'ATPases 'A'ssociated 
with diverse cellular Wctivities. TTie proteins that betong to this family either contain one or two AAA domains Proteins 
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containing two AAA domains: 



- Mammalian and drosophila NSF (N-ethylmaleimide-sensitive fusion protein) and the fungal homolog SEC18 
These proteins are involved in intracellular transport between the endoplasmic reticulum and 00101^38 well as 
between different Golgi cisternae. 

- Mammalian transitional endoplasmic reticulum ATPase (previously known as p97 or VCP) which is involved in the 
transfer of membranes from the endoplasmic reticulum to the golgi apparatus. This protein forms a ring-shaped 
p°mZ^' '^'^ "^^^ ^"^ " ^ spindle^le 

- Yeast protein PAS1 . essential for peroxisome assembly and the related protein PAS1 from Pichia pastoris 

Yeast protein AFG2, ^ 

- Sulfolobus acidocaldarius protein SAV and Halobacterium salinarium cdcH which rr^ay be part of a transduction 
pathway connecting light to cell division. "canbuucuon 

[0182] Proteins containing a single AAA domain: 

■ ^T^^^'Hi other bacteria ftsH (or hfIB) protein. RsH is an ATP-dependent zinc metallopeptidase that 

seems to degrade the heat-shock sigma-32 factor. 

the^Lease ZlH^T ""^""^"^^ ^ '^'^^ cytoplasmic C-temiinal domain that contain both the AAA and 

- Yeast protein YME1 . a protein important for maintaining the integrity of the mitochondrial compartment YME1 is 
also a zinc-dependent protease. 

- Yeast protein AFG3 (or YTA10). This protein also seems to contain a AAA domain followed by a zinc-dependent 

proiease oomain. 

d«n™Ltio"''T^ 'T '^^/^9"'^^°'y «'"'P'«^ °' the 26S proteasome [6] which is involved in the ATP-dependent 

degradation of ubiquitinated proteins: Konucm 

aj^Mammalian subunit 4 and homologs in other higher eukaryotes, in yeast (gene YTA5) and fission yeast (gene 

b) Mammalian subunit 6 (TBP7) and homologs in other higher eukaryotes and in yeast (gene YTA2) 
c Mammalian subunit 7 (MSS1) and homologs in other higher eukaryotes and in yeast (gene CIM5 or YTA3) 
d) Mammalian subunit 8 (P45) and homologs in other higher eukaryotes and in yeast (SUG1 or CIM3 or TBY1) 
and fission yeast (gene Iet1). "iion; 

[018J Other probable subunits such as human TBP1 which seems to influences HIV gene expression by interactina 
with the virus tat transactivator protein and yeast YTA1 and YTA6. P-'^fsion oy inieraciing 

Yeast protein BCS1, a mitochondrial protein essential for the expression of the Rieske iron-sulfur protein 
Yeast protein MSP1 . a protein involved in intramitochondrial sorting of proteins 

Yeast protein PASS, and the corresponding proteins PASS from Pichia pastoris and PAY4 from Yarrowia lipolytica 
Mouse protein SKD1 and its fissfon yeast homolog (SpAC2G1 1 .06). "Poiyiica. 
Caenorhabditis elegans meiotic spindle fomiation protein mei-1 
Yeast protein SAP1. 
Yeast protein YTA7. 

Mycobacterium leprae hypothetical protein A2126A. 

in fh!'^p1.'*'H*' '"f"!: o. "^^^ ATP-dependent protein clamps [5]. 

lonS^^H ! ^''-^'"'^'"9 motifs, which are located in the N-terminal half of this domain, there is a highly 
conserved region located in the central part of the domain which was used to develop a signature pattern 
Consensus pattern: (LIVMT]-x-[LIVMTl-[LIVMF]-x-(GATMC]-[ST]-[NS]-x(4)-[LIVM]-D-x-A-ILIFAhx-R 

m Froehlich K.-U.. Fries H.W., Ruediger M.. Erdmann R.. Botstein D., Mecke D. J. Cell Biol 114-443-453f 19911 
2 Erdmann R.. Wiebel FF. Flessau A.. Rytka J.. Beyer A.. Froehlich K.-U.. Kunau W.-H. Cell 64-499.5 10 iQfli 

(3] Peters J.-M.. Walsh M.J.. Franke W.W. EMBO J. 9:1757-1767(1990) 

[4] Kunau W.-H.. Beyer A.. Goette K.. Marzioch M.. Saidowsky J.. Skaletz-Rorowski A.. Wiebel FF Biochimie 75: 
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209-224(1993). 

[5] Confalonieri R. Duguet M. BioEssays 1 7:639^50(1 995).[ 6] Hift W.. Wolf D.H. Trends Biochem. Sci. 21:96-102 
(1 996). 

2. ABC Membrane (ABC transporter transmembrane region). This family represents a unit of six transmembrane 
helices. Many members of the ABC transporter family (ABC tran)have two such regions. See also descriptions of 
ABC Tra n. below, and A8C2 membrane, above. 

3. (ABC Tran) 

ABC transporters family signature 

[0187] On the basis of sequence similarities a family of related ATP-bindingproteins has been characterized [1 to 51 
These proteins are associated with avariety of distinct biological processes in both prokaryotes and eukaryotes but a 
majority of them are involved in active transport of small hydrophilic molecules across the cytoplasmic membrane All 
these proteins share a conserved domain of some two hundred amino acid residues, which includes an ATP-bindinq 
srte. These proteins are cxjllectively known as ABC transporters. Proteins known to belong to this family are listed 
below (references are onV provided for recently determined sequences).ln prokaryotes: - Active transport systems 
components: alkylphosphonate uptake(phnC/phnK/ phnL); arabinose (araG); arginine (artP); dipeptide (dciAD dppD/ 
S iZh 7 r " galactoside (mgIA); glutamine (glnQ); glycerol^-phosphale (ug- 

PC . glycine betaine/L-prohne (proV); glutamate/aspatate (gltL); histidine (hisP); iron(lll) (sfuC), iron(lll) dicitrate (fecEV 
lactc«e (lacK); leucine/isoleucineA^aline (braF^raG;l^,F/livG); maltose (malK); molybdenum (modC); nickel (niki 
nikE); oligopeptide (amiE/amiF.oppD/oppF); peptide (sapD/sapF); phosphate (pstB); putrescine (potG)' ribose (rijsA)- 
spermidine/putrescine (potA); sulfate (cysA); vitamin 812 (btuD). - Hemobrsin^eukotoxin export proteins h^B. cyaB 
and II^B. - Cohcin V export protein cvaB. - Lactococcin export protein IcnC [6J. - Lantibiotic transport proteins nisT 
(nisin) and spaT (subtilin). - Extiacellular proteases B and C export protein prtD. - Alkaline protease secretion protein 
t • " 5®*^;(^'2)-9l"can export proteins chvA and ndvA. - Haemophilus influenzae capsule-polysaccharide export 
pro ein bexA. - Cytochrome c biogenesis proteins ccmA (also known as cycV and helA). - Polysialic acid transport 
protein kpsT - Cell dmsion associated ftsE protein (function unknown). - Copper processing protein nosF from Pseu- 
domonas stutzen. - Nodulation protein nodi from Rhizobium (function unknown). - Escherichia coli proteins cydC and 
^.^H^JU T °' (9«ne "vrA). - Erythromycin resistance protein from Staphylococcus 

ep dermidis (gene msrA). - Tylosin resistance protein from Streptomyces fradiae (gene tIrC) [7]. - Heterocyst differen- 

of a high affinity transport system. - yhbG, a putative protein whose gene is linked with ntrA in many bacteria such as 
Escherichia coli, Klebsiella pneumoniae. Pseudomonas putida. Rhizobium meliloti and Thiobacillus ferrooxidans - 
ITp Jh-M"^ hypothetical proteins yabJ. yadG. yagC. ybbA. ycjW. yddA. yehX. yejR yheS, 

yhiG, yh H, yjcW yjjK, yoji. yrbF and ytfR.ln eukaryotes: - The multidrug transporters (Mdr) (P-glycoprotein) a family 
of closeh^ related proteins which extrude a wide variety of drugs out of the cell (for a review see [8]) - Cystic fibros^ 
transmembrane conductance regulator (CFTR). which is most probably involved in the transport of chloride bns - 
Antigenpeptidetransportersi (TAP1. PSF1, RING4. HAM-I, mtp1)and2(TAP2. PSF2. RING11, HAM-2. mlp2) which 
are involved in the transport of antigens from the cytoplasm to a membrane-bound compartment for association with 
Y^^^ITh ""TT^- ; membrane protein (PMP70). - ALDR a peroxisomal protein involved in 

X-linkedadrenoleukodystrophy [9]. - Sulfonylurea receptor [10]. a putative subunit of the BK:ell ATP-sensitive potassium 
^nZl' r°''i f P^°l^'"^.«^»« brown (bw). Which are involved in the import of ommatidium screening 

pigments. - Fungal elongation factor 3 (EF-3). - Yeast STE6 which is responsible for the export of the a-factor pherom 
one - Yeast muochondnal transporter ATM1 . - Yeast MDL1 and MDL2. - Yeast SNQ2. - Yeast sporidesmin resistance 

Tov ri^K . °' ^^T °' ■ y^^^' '^^^^ P^°'«'" f^-""- This protein is problly 

invoh/ed in the transport of metal-bound phytochelatins. - Fission yeast brefeldin A resistance protein (gene bfri or 

^^frL'rt p"""/!^'' Pf°»«i" (Sene pmdl). - mbpX. a hypothetical chloroplast protein from 

rZ T^ M ' ^'^=^^'1-^^*^ Pf°t«'" <agB from slime mold. This protein consists of two domains: a N-terminal subtilase 
^'^.^f ABC transporter domain.Asa signature pattem forlhis class of proteins, aconserved 
region which IS located between the -A- and the 'B' motifs of the ATP-binding site was used 

f«««^: [LIVMFYq-fSAl-ISAPGLVFYKQHJ-G-IDENQMWJ-IKRQASPCLIMFWI-fKRNQSTAVMl- 
IKRACLVMHLIVMFYPAN]-{PHYHLIVMFW1- (SAGCUVP]-{FYWHP}.{KRHP}.IUVMFYWSTAJ The STp bTrfdl^^^i 
gion IS duplicated in araG. mdl, msrA. rbsA, tIrC. uvrA. yejF. Mdr^s. CFTR. pmdl and in EF-3. In some of those proteins 
the above pattem only detect one of the two copies of the domain. The proteins bebnging to this famiV also contain 
one or two copies of the ATP-binding motifs 'A' and B". » .-mriy aiso contain 
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[ 1] Higgins C.R. Hyde S.C., Mimmack M.M.. Gileadi U., Gill D.R., Gallagher M.R J. Bbenerg Biomembr 22* 
571-592(1990). 

[ 2] Higgins C.F., Gallagher M.P.. Mimmack M.M., Pearce S.R BioEssays 8:111-116(1988), 

[ 3] Higgins C.R. Hiles I.D.. Salmond G.P.C.. Gill D.R.. Downie J.A.. Evans I.J.. Holland I.B., Gray L , Buckels S 

D.. Bell A.W., Hermcxison M.A. Nature 323:448-450(1986). 

[ 4] Ctoolittle R.F.. Johnson M.S., Husain I., van Houten B.. Thomas D.C., Sancar A. Nature 323:451 -453(1986) 
[ 5] Blight M.A., Holland LB. Mol. Microbiol. 4:873-880(1990). 

[ 6] Stoddard G.W., PetzelJ.R, van BelkumM.J.. KokJ.. McKay LL.Appl. Environ. MicrobbL 58:1952-1961(1992) 
[ 7] Rosteck PR- Jr. Reynolds PA., Hershberger C.L Gene 102:27-32(1991). 
[8] Gottesman M.M., Pastan I. J. Biol. Ghem. 263:12163-12166(1988). 
[ 9] Valle D., Gaertner J. Nature 361:682-683(1993), 

[10] Aguilar-Bryan L.. Nichols C.G., Wechsler S. W,. Clement J.R IV. Boyd A.E. III. Gonzalez G., Herrera-Sosa H 
Nguy K.. Bryan J., Nelson D.A. Science 268:423-426(1995). 

4 (ACBP) 

Acyl-CoA-binding protein signature 

[0189] Acyl-CoA-binding protein (ACBP) is a small (10 Kd) protein that binds medium- and long-chain acyl-CoA 
esters with very high affinity and may function as an intracellular carrier of acyl-CoA esters [1]. ACBP is also known 
as diazepam binding inhibitor (DBI) or endozepine (EP) because of its ability to displace diazepam from the benzodi- 
azepine (BZD) recognition site located on the GABA type A receptor It is therefore possible that this protein also acts 
• as a neuropeptide to modulate the action of the GABA receptor [2].ACBP is a highly consented protein of about 90 
residues that has been so far found in vertebrates, insects and yeast. ACBP is also related to the N-terminal section 
of a probable transmembrane protein of unknown function whichhas been found in mammals. As a signature pattem 
the region that corresponds to residues 1 9 to 37 in mammalian ACBP was selected. 
Consensus pattem: P-[STA]-x-[DEN]-x-[LIVMF]-x(2)-[LIVMFY].Y-[GSTA]-x-[FY]-K-Q-[STA](2)-x-G- 

[ 1] RoseTM.. Schultz E,R., Todaro G.J. Proc. Natl. Acad. Sci. U.S.A. 89:11287-11291(1992) 
[ 2] Costa E,. Guidotti A. Life Sci. 49:325-344(1 991 ). 

5. (AIRS) 

AIR synthase related proteins 

[0190] This family includes Hydrogen expression/fomiation protein HypE, AIR synthases. FGAM synthase and se- 
lenide. water dikinase. 

6. (AMP-binding) 

Putative AMP-binding domain signature 

[01 91] It has been shown [1 to 5] that a number of prokaryotic and eukaryotic enzymes which all probably act via an 
ATP-dependent covalent binding of AMP to their substrate, share a region of sequence similarity. These enzymes are 
- Insects luciferase (luciferin 4-monooxygenase). Luciferase produces light by catalyzing the oxidation of luciferin in 
presence of ATP and molecular oxygen, - Alpha-aminoadipate reductase from yeast (gene LYS2). This enzyme cata- 
lyzes the activation of alpha-aminoadipate by ATP-dependent adenylation and the reduction of activated alpha-ami- 
noadipate by N ADPH. - Acetate-CoA ligase (acetyl-CoA synthetase), an enzyme that catalyzes the formation of acety I- 
CoA from acetate and CoA. - Long-chain-fatty-acid-CoA ligase. an enzyme that activates long^hain fatty acids for 
both the synthesis of cellular lipids and their degradation via betaoxidation.-4-coumarate-.CoA ligase (4CL) a plant 
enzyme that catalyzes the fomnation of 4-coumarate-CoA from 4-coumarate and coenzyme A; the branchpoint reactbns 
between general phenylpropanold metabolism and pathways leading to various specific end products - O^uccinyl- 
benzofc acid-CoA ligase (OSB-CoA synthetase) (gene menE) [6]. a bacterial enzyme involved in the biosynthesis of 
menaquinone (vitamin K2). - 4-Chlorobenzoate-CoA ligase (EC 6,2.1.-) (4.CBA-C0A ligase) [7], a Pseudomonas 
enzyme involved in the degradation of 4-CBA. - Indoleacetate-lysine ligase (lAA-lysine synthetase) (8). an enzyme 
from Pseudomonas syringae that converts indoleacetate to lAA-lysine. - Bile acid-CoA ligase (gene baiB) from Eubac- 
terium strain VPI 12708 [4]. This enzyme catalyzes the ATP-dependent formation of a variety of C-24 bile acid-CoA.- 
Crotonobetaine/camitine-CoA ligase (EC 6.3,2.-) from Escherichia coli (gene caiC). - L-(alpha.aminoadipyl).L-cystei. 
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10 



nyl-D-valine synthetase (ACV synthetase) from various fungi (gene acvA or pcbAB). This enzynne catalyzes the first 
step in the biosynthesis of penicillin and cephalosporin, the fornnation of ACV from the constituent amino acids. The 
amino acids seem to be activated by adenylation. It is a protein of around 3700 amino acids that contains three related 
dctfnains of about 1000 amino acids. - Gramicidin S synthetase I (gene grsA) from Bacillus brevis. This enzyme cata- 
lyzes the first step in the biosynthesis of the cyclic antibiotic gramicidin S, the ATP-dependent racemization of pheny- 
lalanine - Tyrocidine synthetase I (gene tycA) from Bacillus brevis. The reaction earned out by tycA is identical to that 
catalyzed by grsA - Gramicidin S synthetase 11 (gene grsB) from Bacillus brevis. This enzyme is a multifunctional protein 
that activates and polymerizes proline, valine, omithine and leucine. GrsB consists of four related domains. - Entero- 
bactin synthetase components E (gene entE) and F (gene entF) from Escherichia coli. These two enzymes are involved 
in the ATP<lependent activation of respectively 2,3-dihydroxybenzoate and serine during enterobactin (enterochelin) 
biosynthesis. - Cyclic peptide antibiotic surfactin synthase subunits 1 , 2 and 3 from Bacillus subtilis. Subunits 1 and 2 
contains three related domains while subunit 3 only contains a single domain. - HC-toxin synthetase (gene HTS1 ) from 
Cochliobolus carbonum. This enzyme activates the four amino acids (Pro, L-Ala, D-Ala and 2-amino-9,10-epoxi-8- 
oxodecanoic acid) that make up HC-toxin, a cyclic tetrapeptide. HTS1 consists of four related domains.There are also 
fs some proteins, whose exact function is not yet known, but whichare, very probably, also AMP-binding enzymes. These 
proteins are: - ORA (octapeptide-repeat antigen), a Plasnnodium falciparum protein whose function is not known but 
which shows a high degree of similarity with the above proteins. - AngR, a Vibrio anguillarum protein. AngR is thought 
to be a transcriptional activator which modulates the anguibactin (an iron-binding siderophore) biosynthesis gene clus- 
ter operon. But it is believed [9], that angR is not a DNA-binding protein, but rather an enzyme involved in the biosyn- 
20 thesis of anguibactin. This conclusion is based on three facts: the presence of the AMP-binding domain; the size of 
angR (1048 residues), which is far bigger than any bacterial transcriptional protein; and the presence of a probable S- 
acyl thioesterase immediately downstream of angR. - A hypothetical protein in mmsB 3'region in Pseudomonas aer- 
uginosa. - Escherichia coli hypothetical protein ydiD. - Yeast hypothetical protein YBR041 w. - Yeast hypothetical protein 
YBR222C. - Yeast hypothetical protein YER147C.AII these proteins contain a highly conserved region very rich in gty- 
2S cine, serine, and threonine which is followed by a consen/ed lysine. A parallel can be drawn between this type of 
domain and the G-x(4)-G-K-[ST] ATP-/GTP-binding 'P-loop' domain or the protein kinases G-x-G-x(2)-[SG]-x(10.20)- 
KATP-binding domains. 

[0192] Consensus pattem: [LIVMFY]-x(2)-[STG]-[STAG]-G-[ST]-[STEI]-[SG]-x-[PASLI VM]-[KR] In a majority of cas- 
es the residue that follows the Lys at the end of the pattem is a Gly 
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[1]Toh H. Protein Seq. Data Anal. 4:111-117(1991). 

[ 2] Smith D.J., Eari A. J., Turner G. EMBO J. 9:2743-2750(1990). 

[ 3] Schroeder J. Nucleic Acids Res. 17:460-460(1989). 

[ 4] Mallonee D.H., Adams J.L., Hylemon P.B. J. Bacterid. 174:2065-2071(1992). 
35 [ 5] Turgay K., Krause M., Marahiel M.A. Mol. Microbiol. 6:529-546(1992). 

[ 6] Driscoll J.R., Taber H.W. J. Bacteriol. 174:5063-5071(1992). 

[7] Babbitt PC, Kenyon G.L, Matin B.M., Charest H.. Sylvestre M., Scholten J.D.. Chang K.-H.. Liang P-H.. 
Dunaway-Mariano D. Biochemistry 31 :5594-5604(1 992). 
[ 8] Farrell D.H., Mikesell P, Actis LA., Crosa J.H. Gene 86:45-51 (1990). 
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7. AP2 domain 



[0193] This 60 amino acid residue domain can bind to DNA [1]. This domain is plant specific. Members of this family 
are suggested to be related to pyridoxal phosphate-binding domains such as found in aminotran 2 [3]. AP2 domains 
4S are also described in Jofuku et al., copending U.S. Patent applications 08/700,152, 08/879,827, 08/912 272 
09/026,039. 

[1] Ohme-takagi M, Shinshi H; Plant Cell 1995;7:173-182. 
[2] Weigel D; Plant Cell 1995;7:388-389. 
^ [3] Mushegian AR, Koonin EV; Genetics 1996;144:817-828. 

8. ARID 

[01 94] The ARID domain is an AT-Rich Interaction dwnain sharing structural homology to DNA replication and repair 
55 nucleases and polymerases. 

[1] Herrscher RF, Kaplan MH, Leisz DL. Das C, Scheuermann R. Tucker PW; Genes Dev 1995;9:3067-3082. 
[2] Yuan YC, Whitson RH. Liu Q, Itakura K. Chen Y; Nat Struct Biol 1998;5:959-964. 
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9. (ATP synt) 

ATP synthase gamma subunit signature 



10195] ATPsynt.ase(pro.c.-Uans^tlngATPase)(EC^^ 

o1 eubacteria. the inner membrane of mrtochondr« ^^^^^^^IfTZTcZmL Z, Llled coupling factor CF(1 ). 
is composed of an oligomeric transmembrane sector. '^^"^^^^J gamma delta and epsilon. 

The former acts as a proton channel; the ^ ^^^J^^J ^^^^^^^^^^^ 'protons through the CF(0) 

Subunit gamma is believed to be important in '^^f'^^^^^^^^ rcierminus which seems to be essentBl for as- 

species gamma subunits are not detected by this motif. 

[ 11 Futai M., Noumi T. Maeda M. Annu. Rev. Biochem. 58:111-136(1989). 
r 21 Sentar A E Physiol. Rev. 68: 177-231 (1 988). 

i al S J Maeda M.. Mukohata Y. Futai M. FEBS Lett. 232:221 -226(1988). 

10. (ATP Synt A) 

Synthase a subunit signature 

^0197, ATPsynthase(proton-transl«atingATPase,(E^^^^ 

of eubacteria. the inner membrane of "^"f S Sts as a prcJon channel, and a cata^ic 

is composed of an oligorj^^^^^^^^^^^ 

core, termed coupling factor CF(1 ).The CF(0) a ^"Dunn. diso h hydrophobic protein that has been 

[ 1] Futai M.. NoumI T. Maeda M. Annu. Rev. Biochem. 58:111-136(1989). 
r ?1 Senior A E Physiol. Rev. 68:177-231 (1988). 
3 !rs M L Chang J.A., Simoni R.D. J. Biol. Ch-^65.05^ 
[ 4] Cain B.D.. Simoni R.D. J. Biol. Chem. 264:3292-3300(1989). 

11. ATP synthase B 

B subunits are thought to interact with the stalk of the CF(1) subunits. 



12. (ATP synt C) 
ATP synthase c subunit signature 



s 



[0199, ATPsynthase(proton-trans.ocatingATPaseMl.2,^^^^^^^^^^^^ 
L inner membrane Of mitc^hondri^^d^^^^^^^ 

of an oligomeric transmembrane sector, called CF(0). wn'cn ^= ^ ^ ,3 ^ highly hydrophobic 
coupling factor CF(1).The CF(0) c subun.t (also '^^^^^'^^^^^^^^^ '^^^^"^ ATPasi. Structural^ subuntt c 
protein of about 8 Kd ^^'f^^"- ^^^^^^^^ 

consist of two long terminal hydrophobic regic^is, wnicn proodoiyj.Ha ohnli^h the ATPase activity. DCCD 
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[0200] Consensus pattern: {GSTAl-R.[NQ]-P-x(1 OHLI VMFYW](2)-x(3)-[LIVMFYW]-x-[DE] [D or E binds DCCD] 

[ 1] Futai M., Noumi T, Maeda M. Annu. Rev. Biochem. 58:111-136(1989). 
[ 2] Senior A.E. Physiol. Rev. 68:177-231(1988). 

[ 3J Ivaschenko A.T., Karpenyuk T.A., Pononnarenko S.V. Biokhimila 56:406-419(1991). 
[ 4] Recipon H., Perasso R.. Actoutte A.. Quetier F J. Mol. Evol. 34:292-303(1992). 

13. (ATP synt DE) 

ATP synthase, Delta/Epsilon chain 

[0201] Part of the ATP synthase CF(1 ). These subunits are part of the head unit of the ATP synthase. The subunits 
are called delta and epsilon in human and metozoan species but in bacterial species the delta (D) subunit is theequiv- 
alent to the Oligomycin sensitive subunit (OSCP) in metozoans. 

14. (ATP syntab) 

ATP synthase alpha and beta subunits signature 

[0202] ATP synthase (proton-translocating ATPase) [ 1 .2] is a component of the cytoplasmic membrane of eubacteria 
the inner membrane of mitochondria,and the thylakoid membrane of chloroplasts. The ATPase complex is composed 
of an ohgomerlc transmembrane sector, called CF(0). and a catalytic core, called coupling factor CF(1). The former 
acts as a proton channel; the latter Is composed of five subunits. alpha, beta, gamma, delta and epsilon The sequences 
of subunits alpha and beta are related and both contain a nucleotide-binding site for ATP and ADP The beta chain has 
catalytic activity, while the alpha chain is a regulatory subunit. Vacuolar ATPases [3] (V-ATPases) are responsible for 
acidifying a variety of intracellular compartments in eukaryotic cells. Like F-ATPases. they are oligomeric complexes 
of a transmembrane and a catalytic sector The sequenceof the largest subunit of the catalytic sector (70 Kd) is related 
to that of F-ATPase beta subunit, while a 60 Kd subunit, from the same sector, is related to the F-ATPases alpha subunit 
[4].Archaebacterial membrane-associated ATPases are composed of three subunits.The alpha chain is related to F- 
ATPases beta chain and the beta chain is related to F-ATPases alpha chain [4]. A protein highly similar to F-ATPase 
beta subunits is found [5] in some bacterial apparatus involved In a specialized protein export pathway that proceeds 
without signal peptide cleavage. This protein is known as flil in Bacillus and Salmonella, Spa47 (mxiS) in Shigella 
flexneri. HrpB6 in Xanthomonas campestris and yscN in Yersinia virulence plasmids.To detect these ATPase subunits, 
a segment of ten amino-acid residues, containing two conserved serines, as a signature pattern was selected The 
first serine seems to be important for catalysis - in the ATPase alpha chain at least - as its mutagenesis causes catalvtic 
impairment. 

[0203] Consensus pattern: P-[SAP]-[LI V]-[DNH]-x(3).S-x.S [The first S is a putative active site residue] 

[ 1] Futai M., Noumi T, Maeda M. Annu. Rev Biochem. 58:111-136(1989), 

[ 2] Senior A.E. Physiol. Rev 68:177-231(1988). 

[ 3] Nelson N. J. Bioenerg. Bbmembr. 21:553-571(1989). 

[ 4] Gogarten J.P, Kibak H., Dittrich R. Taiz L, Bowman E.J., Bowman B.J.. Manolson M.F.. Poole R.J.. Date T 
Oshima T, Konishi J., Denda K.. Yoshida M. Proc. Natl. Acad. Sci. U.S.A. 86:6661-665(1989). 
[ 5] Dreyfus G.. Williams A.W.. Kawagishi I., MacNab R.M. J. Bacteriol. 175:3131-3138(1993). 

15. (ATP syntab C) 

ATP synthase ab C terminal. 
[0204] N umber of members: 1 90 

[1] Abrahams JP, Leslie AG, Lutter R. \Afalker JE; Structure at 2.8 A resolution of FI-ATPase from bovine heart mito- 
chondria." Nature 1994;370:621-628. 

16. (A deaminase) 

Adenosine and AMP deaminase signature 

[0205] Adenosine deaminase catalyzes the hydrolytic deamination ofadenosine into inosine. AMP deaminase cat- 



32 



EP 1 033 405 A2 



alyzes the hydrolytic deamination of AMP into IMP It has been shown [1) that these two types of enzymes share three 
regions of sequence similarities; these regions are centered on residues which are proposed to play an important role 

in thecatalytic mechanism of these two enzymes. Oneoftheseregions. containing twoconsen/edaspartic acid residues 
that are potential active site residues was selected. 

Consensus pattern: ISAHLIVM]-[NGS]-[STA]-D-D-P [The two D's are putative active site residues] 
[1] Chang Z.. Nygaard P, Chinault A.C., Kellems R.E. Biochemistiy 30:2273-2280(1991). 

17. (Acetyltransf) 



Acetyltransferase (GNAT) family. 

[0206] Th is family contains proteins with N-acetyltransf erase functions. 
[1] Neuwald AF. Landsman D; Trends Biochem Sci 1997;22:154-155. 



18. (Aconitase C) 



Aconitase family signature 



[02071 Aconrtase (aconitate hydratase) (EC 4.2.1. 3) [1] Is the enzyme from the tricarboxylic acid cycle that catalyzes 
the reversible isomerization of cHrate and isocitrate. Cis-aconitate is formed as an intermediary product during the 
course of the reaction. In eukaryotes two isozymes of aconitase are known to exist; one found in the mitochondrial 
matrix and the other found in the cytoplasm. Aconitase. in its active form, contains a 4Fe-iS iron-sulfur cluster- three 
cysteine residues have been shown to be ligands of the 4Fe-4S cluster It has been shown that the aconitase 'family 
also contains the followingproteins: - Iron-responsive element binding protein (IRE-BP). IRE-BP is a cytosolic protein 
that binds to iron-responsive elements (IREs). IREs are stem-loop structures found in the 5'UTR of ferritin and delta 
aminolevulinic acid synthase mRNAs, and in the 3'UTR of transferrin receptor mRNA. IRE-BP also express' aconitase 
activity. - 3-isopropylmalate dehydratase (EC 4.2.1.33) (isopropylmalate isomerase). the enzyme that catalyzes the 
second step in the biosynthesis of leucine. - Homoaconitase (EC 4.2.1.36^ (homoaconitate hydratase). an enzyme that 
participates in the alpha-aminoadipate pathway of lysine biosynthesis and that converts cis-homoaconitate into ho- 
moisocitnc acid. - Esherichia coll protein ybhJ.As a signature for proteins from the aconitase family, two conserved 
regions that contain the three cysteine ligands of the 4Fe^Scluster were selected 

Consensus pattern: [LIVM]-x(2)-[GSACIVM]-x-[LIV]-[GTIV]-[STP].C-x(0,1)-T-N-(GSTANI]-x(4)-[LIVMA] [C binds the 
iron-sulfur csnter] 

Consensus pattern: G-x(2)-[LIVWPQ]-x(3)-[GAC]-C-[GSTAM]-[LIMPTA]-C-[LIMV]-[GA] [The two C's bind the iron-sul- 
fur center] 

[ 1] Gnjer M.J., Artymiuk PJ., Guest J R. Trends Biochem. Sci. 22:3-6(1997). 
19. (Acyl-CoAdh) 



Acyl-CoA dehydrogenases signatures 



[0208] Acyl-CoA dehydrogenases (1 .2.3] are enzymes that catalyze the alpha, beta-dehydrogenation of acyl-CoA 
esters and transfer electrons to ETF. the electron transfer protein. Acyl-CoA dehydrogenases are FAD flavoproteins 
This family currently includes: - Five eukaryotic isozymes that catalyze the first step of the beta^xidation cycles for 
fatty acids with various chain lengths. These are short (SCAD) (EC 1.3.99.2). medium (MCAD) (EC 1 3.99.3). tanq 
(LCAD) (EC 1.3.99.13), very-long (VLCAD) and shortA)ranched (SBCAD) chain acyl-CoA dehydrogenases These 
enzymes are kjcated in the mitochondrion. They are all homotetrameric proteins of about 400 amino acid residues 
except VLCAD which is a dimer and which contains, in its mature fomi. about 600 residues. - Glutaryl-CoA dehydro- 
genase (ECi3J9J) (GCDH). which is involved in the catabolism of lysine, hydroxylysine and tryptophan. - Isovaleryl- 
CoA dehydrogenase (EC 1.3.99.10) (IVD). involved in the catabolism of leucine. - AcyKoA dehydrogenases acsA and 
mmgC from Bacillus subtilis. - Butyryl-CoA dehydrogenase (EC 1.3.99.2) from Clostridium acetobutylicum - Es- 
cherichia coll protein caiA [4]. - Escherichia coli protein aidB. Two conserved regions were selected as signature pat- 
terns. The first IS located in the center of these enzymes, the second in the C-ferminal section 
Consensus pattern: [GAC]-[LIVM]-[ST]-E-x(2)-[GSAN]-G-[ST]-D-x(2)-[GSA] 
Consensus pattern: [QDE]-x(2)-G-[GS]-x-G-[LIVMFY]-x(2)-[DEN]-x(4)-[KR]-x(3)-[DEN] 

( 1] Tanaka K., Ikeda. Matsubara Y, Hyman D.B. Enzyme 38:91-107(1987). 

[2] Matsubara Y. Indo Y. Naito E.. Ozasa H.. Glassberg R.. Vockley J.. Ikeda Y. Kraus J.. Tanaka K. J. Btol. 
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Chem. 264:16321-16331(1989). 

1 3] Aoyama T. Ueno I.. Kamijo T., Hashimoto T J. BioL Chem. 269:19088-19094(1994) 

[ 4] Eichler K., Bourgis R, Buchet A.. Kleber H.-R, Mandrand-Berthelot M.-A. Mol. Microbiol 13:775-786(1994). 

20. (Acyl transf) 

Acyl transferase domain 

[0209] Number of members: 1 61 

[1 1 Serre L, Verbree EC, Dauter Z. Sturtje AR, Derewenda ZS; Medline: 95286570 The Escherichia coli malonyl-CoA" 
1 QoS'"^' transacylase at 1 .5-A resolution. Crystal structure of a fatty acid synthase component." J Biol Chem 

1 995,270. 1 2961 -1 2964. 

21. Acylphosphatase signatures 

[0210] Acylphosphatase (EC MJJ.) ^^ .2] catalyzes the hydrolysis of various acylphosphate cartjoxyl-phosphate 
bonds such as carbamyl phosphate, succinylphosphate. 1.3-diphosphoglycerate. etc. The physiological role of this 
enzymeis not yet clear. Acylphosphatase is a small protein of around 100 amino^cid residues. There are two known 
isozymes. One seems to be specific to muscular tissues, the other, called 'organ-common type', is found In many 
different tissues. While acylphosphatase have been so far only characterized In vertebrates.there are a number of 
bacterial and archebacterlal hypothetical proteins that are highly similar to that enzyme and that probably possess the 
same activity.These proteins are: - Escherichia coli hypothetical protein yccX. - Bacillus subtilis hypothetical protein 
yflL. - Archaeoglobus fulgidus hypothetical protein AF0818. Two consented regions were selected as signature pat- 
terns. The first is located in the N-temninal section, while the second is found in the central part ofthe protein sequence 
Consensus pattern: (LI V]-x-G-x-V-Q-G-V-x-[FM]-R h «mi.b. 

Consensus pattern: G-IFYW]-[AVC]-[KRQAM]-N-x(3)-G-x-V-x(5)-G 

[ 1] Stefani M.. RamponI G. Life Chem. Rep. 12:271-301(1995). 

[ 2] Stefani M.. Taddei N., Ramponi G. Cell. Mol. Life Sci. 53:141-151(1997). 

22. (Adap comp sub) 

Clathrin adaptor complexes medium chain signatures. 

[0211] Clathrin coated vesicles (CCV) mediate intracellular membrane traffic such asreceptor mediated endocytosis 
In addition to clathnn. the CCV are composed of a number of other components including oligomeric complexes which 
are knownas adaptor or clathrin assembly proteins (AP) complexes [1]. The adaptor complexes are believed to interact 
wrth the cytoplasmic tails of membrane proteins, leading to their selection and concentration. In mammals two type of 
adaptor complexes are known: AP-1 which is associated with the Golgi complex and AP-2 which is associated with 
he plasma membrane. Both AP-1 and AP-2 are heterotetramers that consist of two large chains - the adaptins - 

' ^'^"^ ^ ""^^'""^ '^^^'^ AP-1 ; AP50 inAP-2) and a small chain 

(API 9 m AP-1 ; API 7 in AP-2). The medium chains of AP-1 and AP-2 are evolutionary related proteins of about 50 Kd 
Homologs of AP47 and AP50 have also been found in Caenorhabditis elegans (genes unc-101 and ap50) [2] and yeast 
(gene APM1 or YAP54) [3].Some more dK^ergent. but clearly evolutionary related proteins have also been found in 
yeast: APM2 and YBR288c., Two conserved regions were selected as signature patterns, one kx:ated in the N-terminal 
region, the other from the central section of these proteins. 

Consensus pattern: [IVT]-[GSP)-W-R-x(2,3)-[GAD]-x(2)-IHY]-x(2)-N-x- [LIVMAFY](3)-D-[LIVM]-(LIVMT1-E 
Consensus pattern: [LIV]-x-F-l-P-P-x-G-x-[LIVMFY]-x-L-x(2)-Y 

( 1] Pearse B.M.. Robinson M.S. Annu. Rev. Cell Btol. 6:151-171(1990). 
[ 2] Lee J., Jongeward G.D., Sternberg P.W. Genes Dev. 8:60-73(1994). 

[ 3] Nakayama Y, Goebl M.. O'Brine G.B.. Lemmon S.. Pingchang C.E., Kirchhausen T Eur. J. Biochem. 202- 
669-574(1991). 
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23. (Adenylsucc synt) 
Adenylosuccinate synthetase signatures 



Consensus pattern: Q-W-G-D-E-G-K-G 

Consensus pattern: G-l-[GR]-P-x-Y-x(2)-K-x(2)-R (K is the active site residue] 

f 2l^i!«rM"p''; - Noegel A.A.. Schleicher M. J. Biol. Chem. 266:2480-2485(1991) 

3 ^^vl ; B^W., Hoffman C.R., Fromm H.J.. Honzatko R.B. J. Mol. Biol. 25^31 995) 

[ 3] Bouyoub A., Barbier G.. Forterre P.. I^bedan B. 2.3.CO:2.-J. Mol RIqI. 261:l44O..S( i00fi) 

24. (AdoHcyase) 

S-adenosyl-L-homocysteine hydrolase signatures 

yase is anubiquitous enzyme wh'ich S d a d K^d!^^^^ and homocysteine. AdoHc- 

i:i2rd==^^^^^^^^^ 

a region thought to be involved irNAD^lncJIng a9lyc,ne-rK.h reg.on ,n the central part of AdoHcyase; 

Consensus pattern: [GSA]-[CSJ-N-x-[FYLM]-S-isT]-(QA]-[DEN]-x-[AV)-[AT]-rADl-rACl-rLIVMCGl 

Consensus pattern: [GA]-[KSJ-x(3)-[LIV]-x-G-[FY]-G-x-tiq.G- KR^^^ '^""^ 

[ 1] Sganga M.W.. Aksamtt R.R.. Cantoni G.L, Bauer C.E. Proc. Natl. Ac^d. Sci. U.S.A. 89:6328-6332(1992). 

25. AhpC/TSA family 

im^LTaSAT^ '° '^y^-P-xide reductasecomment: (AhpC) and thiol specific 

[1] Chae HZ, Robison K, Poole LB. Church G, Storz G, Rhee SG, Proc Natl Acad Sci U S A 1994;91:7017-7021 

26. (Aldose epim) 

[021 5] Aldose 1 -epimerase putative active site Aldose 1 -eDtmerase ^Pr i ^ ox .t,,^,^ , - 

sible for the anomeric interconversion of D^^lucose and othT^iH^^^^^ -3.3) (mutarotase) is the enzyme respon- 

Consensus pattern: INSJ-x-T-N-H-x-Y-[FW]-N-[LI] 

[ 1] Poolman B.. Royer TJ.. Mainzer S.E.. Schmidt B.R J. Bacterioh 172:4037.4047(1990). 
27. (AlkA DNA repair) 

Alkylbase DNA glycosidases alkA family signature 
ffJxle'^atu's'a^Xed^^^^^^^ 
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of alkylalion products (EC 3.2 2 21 ) Tao and aiicA Hr. 

alkylbase DNA gtycosLe (gene MAG1H2 3^^^ ? h '^'^ « ^ 

structural^ related to alKA. MAG ar,d aiki are trC^einsTl^T^^^^^ T which is 

terminal ends appear to be unrelated th«r« ■« » ■ ^ residues. While the C- and N- 

^ Of this region hTs'been se^it^' t a Sure par^"" °' ^ ^"'^"^^ ^''^^ — ^ Son 
consensus pattern: G-l-G-x-W.[ST]-[AV]-x-(LIVMFY](2)-x-[LIVM^x(8)-[MI^ 
[1] Lindahl T. Sedgwick B. Annu. Rev. Biochem. 57:133-157(1988) 
[ 3] Chen J.. Derfler B.. Samson L. EMBO J. 9:4569-4575(1 990). "^^^^^^ 
28. Ammonium transporters signature 

SJSZ^ Lr s^^n "toTe^rztTiTd ~° - - 

transporters MEPl. MEP2 and MEP3. - Arabidoilis JhaS^^^^^^^^ " ""'^^ ^'"^nium 

rynebacterium glutamicum ammonium and rnSSmr^^^^ 1™"" ?'"T'" '^'^T^)- ' Co- 

transporter amtB. - Bacillus subtilis nrgA - Cob^Z,mt bTrT T"" ." ^^'^^^"^'^'^ <=°" Putative ammonium 
chocystis strain PCC 6803 hypotheticai%oteS;^^.?oT a TosS 3" MtCY338.09c. - Syne- 

proteins MJ0058 and MJ1343. - Caenortiabditis eleoans Z Jh!.^l. ^'^'^'^"^ '^""^^^^ hypothetical 

expected by their transport function, these p«^teiniTe hiqh^^^^^^^^^^ '^'^'' ^ "^^95.3 As 

membrane domains. The best consented reaton sLr^^ ,^ ,^ . . *° 10 to 12 trans- 

is used as a signature pattern ^ '° '"^'^'^ '^"^ ^ixth) transmembrane region and 

Consensuspattern:D-,PVWS^A-G-^GSC^x(2,-^V]-x(3HSAGK2^x^^^^^^^^^^^ 

! J! Nin^emann O.. Janniaux J.-C. Frommer W.B. EMBO J. 1 3 3464-3471 M 994» 

S.ewe R.M., Wei, B.. BurkovsKi A.. EiKmanns B... EikmanntT.S: Biol. Chem. 271:5398-5403 

( 3] Saier M.H. Jr. Adv. Microbiol. Physiol. 40:81-136(1998). 
29. {Arch_histone) 
CBF/NF-Y subunits signatures 

[0218] Diverse DNA binding proteins are known to bind the CCAAT box srnmmo • 

promoter and enhancer regions of a large number of qenes'n eukaL^^ a T u""^''"^ 

the CCAAT-binding factor (CBF) or NF-Y [11 CBF J a heterLl^7 • '^^'^ ^"^^'"^ "^^wn as 

components both needed for DNA4,inding fhe HAP p^oS °' 

cytochrome C iso-1 gene (CYC1) as well as otheTn^nl^r^^^^^^ 

their expression. It also reign Jst e Cuence Cc!^^^^ "^'f ^""^^al electron transport and activates 

subunit Of CBF. known as CBF-A or NF-YB^n ^erS^L ^^^^^^^^ '° ^^"^ "^^t 

protein of 116 to 210 amino-acid residues ^^hloltra^Zl. J "'P' ^^^^'^ « 

domain seems to be involvedinDNA-bindi^asSn^e 
subunit of CBF.known as CBF-BorNF-YAiLrtebratesHl^^^^^^ 

Of 265 to 350 amino-acid residues v^ich contains a WoL cnn^^ T"^ ^'^"^ '"^ "^P' « P^°'«in 
the-essen,Blcore-[2J. seems toconsistofSri^cS^^ 

DNA recognton domain. A signature pattern hafSS devX^ « ^ ^"^ « ^-terminal 

Consensus pattern: C-V-S-E-x-l-S-F-ILIVM]-! ^ E-a-^S^^^ KRot c °" °' ^"''"""-sociation domain. 
Consensus pattern: Y-V-N-A-K-Q-Y-x-R-I-L-K-R-R-x-A-rS^K-L-E- 

ios'-lSJSr"' " ■ " • "^'^^ ^- D- N"c.e. Acids Res. 20: 

[2] Olesen J.T.. Fikes J.D.. Guarente L. Mol. Cell. Biol. 11:611-619(1991). 

30. Argininosuccinate synthase signatures 
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oSSf.tt'^^^n "f^ ATP-dependent ligation of citrulline to aspartate to form argir,ir,osuccinate. AlWIP ar,dpyro- 
Snnlln ' H "T^^ ^ "T' '^^ 9'"" '^^"^^^ citruilinemia. a genelb disease characterized bTsTve^ 
vomrting spells and mental retardation.AS is a homotetrameric enzyme of chains of about 400 amino-acid rescues 
Anarg.n>ne seems to be important for the enzyme's catalytic mechanfem. The sequences of AS fZSSs p^^terj: 

^e first >s a highly conserved stretch of nine residues located in the N-terminal extremity of these enzymes Se second 
IS denved from a conserved region which contains one of the conserved arginine residues 
Consensus pattern: (AS]-(FY]-S-G-G-[LV]-D-T-[ST]- 
Consensus pattern: G-x-T-x-K-G-N-D-x(2)-R-F- 

L'Se''Sl04^^^^^^^^ '''"^^"^ '^^^"'"^^ ^"-^-9 S., Glansdorff 

[ 2] Morris C.J.. Reeve J.N. J. Bacteriol. 170:3125-3130(1988). 

31. Armadillo/beta-catenin-like repeats 

^^^^ , ^f"'^'^.^^^ ™Psa«- Tandem repeats fomi super-helix of helices that is proposed to mediate inter- 
action of beta-catenin with its ligands. CAUTION: This famity does not contain all known aLSlfo repeTtr 

[1] Huber AH. Nelson WJ. Weis Wl, Cell 1997;90:871-882. 

[2] Gumbiner BM, Curr Opin Cell Biol 1995;7:634-640. 

[3] Cavallo R, Rubenstein D, Peifer M, Curr Opin Genet Dev 1997;7:459-466 

[4] Su LK, Vogelstein B, Kinzler K W. Science 1 993;262: 1 734-1 737. 

[5] Masiarz FR, Munemitsu S, Polakis P Science 1993;262:1731-1734 

[6] Peifer M, Wieschaus E, Cell 1990;63:1167-1176. 

32. (Asn Synthase) 
Asparagine synthase 

'^LTJs^^n.^^'' ^-''-^ °' '-'^ -tavse the conversion of 

33. Asparaginase_2 
Asparaginase 1 2 members 

34. (Aspartyl tRNA N) 

Aminoacyl-transfer RNA synthetases class-ll signatures 

S ^'"°^f„^r'^^'y"^^"'^"^=(^°6-l-'-)nJareagroupofenzym^ 

hem to specrfic IRNA molecules as the first step in protein biosynthesis. In prokaryotic organisms there are aHeas 
twenty different types of aminoacyl-tRNA synthetases, one for each different amino acid. InT^Xs Sn- 

White ^rer"''!, H ''"^'''''^ 'r '^Vtosolic form and am^Zn^S 

While all these enzymes have a common function, they are widely diverse in terms of subunit size and of quaternaJI 
s^cture. The synmetases specific for alanine, asparagine. aspartic acid, g^cine, histidine t^^e p^LS^ine"' 
proline, serine, and threonine are referred to as class-ll synthetases [2 to 6] and probably have a comZ ^^n 
pattern in their cata^ic domain for the binding of ATP and amino acid wiich is Lrem tothe R^s^^^^^^^ 
for the Class I synthetases [7]. Class-ll tRNA synthetases do not share a high degree of similarirhov^^vefanea^ 
three consen/ed regions are present [2.5.8]. Signature patterns have been derived from two oflSese rSs 
Consensus pattern: IFYH]-R-x-[DE]-x(4.12)-[RHJ-x(3)-F-x(3)-[DE] ^ 
Consensus pattern: [GSTALVF]-{DENQHRKPWGSTAHLIVMF]-[DEJ-R-(LIVMF]-x-[LIVIWSTAG]-lLIVMFY] 

[ 1J Schimmel R Annu. Rev Biochem. 56:125-158(1987). 
[ 2] Delarue M.. Moras D. BbEssays 15:675^7(1993). 
( 3J Schimmel P. Trends Biochem. Sci. 16:1-3(1991). 

[ 4] Nagel G.M.. Doolittle R.F Proc. Natl. Acad. Sci. U.S.A. 88:8121-8125(1991) 
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[ 5] Cusack S., Haertlein M.. Lebenran R. Nucleic Acids Res. 19:3489-3498f1991) 
[ 6] Cusack S. Biochimie 75:1077-1081(1993). 

! 2 ?^!^^ ^" ^„^'*«^-C°'°'"!"as C. Haertlein M., Nasser N., Leberman R. Nature 347-249-255f 1 9901 
[ 8] Leveque R. Plateau P.. Dessen P. Blar,quet S. Nucleic Acids Res. 18:305-31 2(1 So) 



[0223] 



GD. Thelbert AS; J Biol Chem 1996 271 -18859-1 BsS ' '"^^ ^"""^ 

form a pair o, alpha helices N^e^c^ m^mSS " ^P^*^ -^''^^ 

[0224] [1 IMedline: 91 2891 38. Three-dimensbnal structure of the LDL receotor hinriinn h i„ , k 

proteir, E. Wilsor, C. Wardell MR. We.graber KH. Mahley RW. A^rd DA; Set e^^^^^^^^^^^^ 

37. Amino acid permeases signature 

histidine permease (gene HIP1 ) - Yeast h/sL n^rn,J.27 . v w 9'"^^'"'"° Permease (gene GNP1 ). - Yeast 
valine and tyrosine pe^ease (Urvlf^^^^^^^^ f ' ^^"^^^'^ P^^^^) " ^east 

transport protein (gene^NMt/CTm ) Y^^^^^^ TAT2/SCM2). - Yeast choline 

- Fission Jeast plin isp5 - fS^ yla^ h^y^S^ n S^A^^^^^^^^ '''"^''"l^' 
SpAC11D3.08c.-Emericellanidulans proline traCo?pTotS^ 

mease INDA1 - SalmoneiiA tvnhim..ri.ir« i ^ H'wiwir« tgene prnb). Irtchoderma harzianum ammo acid per- 

se«n 10 coraain „„ ,o ,2 Lsmwnto™ ^m^^,! ? transports ai^mme or omithlno. Thoso proMins 

[ 1) Weber E.. Chevalier M R.. Jund R. J. Mol. Evol. 27:341-350(1988) 
[ 2J Vandenbol M.. Jauniaux J.-C, Grenson M. Gene 83-153-159(1989) 

1 3] Reizer J.. Finley K.. Kakuda D.. McLeod C.L. Reizer A.. Saier M.H Jr Protein Sci. 2:20-30(1993). 
38. aakinase (1) Glutamate 5-kinase signature 

[02261 Glutamate 5-kinase (EC 2i2j,) (gamma-glutamyl kinase) (GK) is the enzyme that catafyzes the first step 
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in the biosynthesis of proline from glutamate. the ATP-dependent phosphoorlation of L-glutamate Into L-glutamate 
5-phosphata In eubacteria (gene proB) and yeast [1] (gene PR01). GK Is a monotunctional protein, while in plants 
and mammals, it is a bifunctional enzyme (P5CS) [2]that consists of two domains: a N-terminal GK domain and a C- 
temrirnal gamma^lutamyl phosphate reductase domain (EC 1.2. 1.41 ) (see <PDOC00940>).As a signature pattern a 
highly conserved glycine-and alanine-rich region located in the central section of these enzymes has been selected 
Yeast hypothetical protein YHR033W is highly similar to GK. oen ssiBQiea. 

Consensus pattern: [GSTN]-x(2)-G-x-G-[GC]-{IM]-x-ISTA]-K-[LIVM]-x-ISAJ-[TCA]-x(2)-[GALV]-x(3)-G- 
( 1] Li W.. Brandriss M.C. J. Bacteriol. 174:4148-4156(1992). 

[ 2] Hu C.-A.A.. Delauney A.J.. Verma D.PS. Proc. Natl. Acad. Scl. U.S.A. 89:9354-9358(1992). 
aakinase (2) Aspartokinase signature 

[Ca27] Aspartokinase (EC 27J^) (AK) [1] catalyzes the phosphorylation of aspartate. The product of this reaction 
can then be used in the biosynthesis of lysine or in the pathway leading to homoserine, which participates in the 
bi^ynthesis of threonine, isoleuclne and methionine. In Escherichia coll. there are three different isozymes which differ 
intheirsensitivrtytorepression and Inhibition by Lys. Met and Thr.AKI (gene thrA) and AK2 (gene metL) are bifunctional 
enzymes which both consist of an N- terminal AK domain and a C-temiinal homoserine dehydrogenase domain AK1 

IfoliT^n^llr^Stt^^^^ °' (gene ^sC). ^ monofunc- 

lonal and involved in lysine synthesis. In yeast, there is a single isozyme of AK (gene HOM3). As a signature pattern 

for AK. a conserved region located in the N-terminal extremity has been selected 

Consensus pattern: ILIVM]-x-K-[FY]-G-G-[STHSC]-[LIVM]- 

[ 1] Rafalski J.A.. Fateo S.C. J. Biol. Chem. 263:2146-2151(1988). 

aakinase (3) Gamma-glutamyl phosphate reductase signature 

fj? Ga"""a-glutamyl phosphate reductase (EC 1.2.1.41) (GPR) is the enzyme that catalyzes the second step In 
the biosynthesis of proline from glutamate. the NADP-dependent reduction of L-glutamate 5-phosphate into L-gluta- 
nrn !in"!lT f and phosphate. In eubacteria (gene proA) and yeast [1] (gene PR02). GPR Is a monotunctional 
p otein. while in plants and mammals, it Is a bifunctional enzyme (P5CS) [2]that consists of two domains: a N-terminal 
glutamate 5-kinase domain(EC 2i2Jl) (see <PDOC00701» and a C-terminal GPR domain. As a signature pattern 

aconservedregion that containstwohlstidineresidueshasbeen selected. This regioniskxated in the iLthlrd^^ 
Consensus pattern: V-x(5)-A-[LIV]-x-H-l-x(2)-[HYJ-[GS]-[ST]-x-H-[ST]-[DE]-x- 1- 

I Y.. Payne J.. Wolf S.S., Kalogeropoulos A.. SchweizerM. Yeast 12-1021-1031(1996) 

[ 2] Hu C.-A.A.. Delauney A.J., Verma D.PS. Proc. Natl. Acad. Sci. U.S.A. 89:9354-9358(1 992). 

39. (abhydrolase) alpha/beta hydrobse fold. This catalytic domain is found In a very wide range of enzymes. 

[0M9] [1] OIlis DL, Cheah E, Cygler M. DIjkstra B. Frolow F. Franken SM. Harel M, Remington SJ. Silman I. Schrag 
J. Sussman JL, Verschueren KHG. Goldman A. Protein Eng 1 992;5: 1 97-21 1 . > a 

40. (Acid phosphat) Histidine acid phosphatases signatures 

SL^f ^*'u''!^TT ^ '^^^ ^ heterogeneous group of proteins that hydrolyze phosphate esters 
optimally at low pH. It has been shown [1 J that a number of acid phosphatases, from both prokaryotes and eukaryotes 
share two regions of sequence similarity, each centered around a conserved histidine residue. These two histidines 
seern to be involv^ in the enzymes' catalytb mechanism [2.3). The first histidine is kDcated in the N-terminal section 
and forms a phosphohistidine Intermediate while the second Is kjcated in the C- temiinal section and possibly acts as 
proton donor. Enzymes betonging to this family are called 'histidine acid phosphatases' and are listed below: 

Escherichia coll pH 2.5 acid phosphatase (gene appA). 
Escherichia coll glucose-1 -phosphatase (EC 3.1.3.10) (gene agp). 
Yeast constitutive and represslble acid phosphatases (genes PH03 and PH06). 
Fissbn yeast acid phosphatase (gene phol ). 
Aspergillus phytases A and B (EC 3.1.3.8) (gene phyA and phyB). 
Mammalian lysosomal ackJ phosphatase. 
Mammalian prostatic acid phosphatase. 
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- Caenorhabdftis elegans hypothetical proteins B0361 .7, C05C1 0. 1 . 005010.4 and F26011 1 

[0231] Consensus pattem[LI VM].x(2)-[LIVMA].x(2HLI VMJ-x.R-^ fH is the phosphohistidine resi- 

Consensus pattem(LIVMF].x-[LIVMFAGJ.x(2) 

ohoihal^^^^^ h''"'"T T'^J'' '''^"^ ^''^ ''''' '''''''' --Pt ^or rat i osSt'ac^ 

phosphatase which seems to have Tyr instead of the active srte His 

ITc^^^Z^'u Davidson R.. Stevis RE.. MacArthurH.. Moore D.L J. Biol. Ohem. 266:2313-2319(1991) 
[2] Ostanin K.. Harms E.H.. Stevis RE.. Kuciel R.. Zhou M.-M.. van Etten R.L J. Biol. Ohem. 267:22830-22836 

[ 3] Schneider G., LIndqvist Y, Vihko R EMBO J. 12:2609-2615(1993). 

41. Aconitase family signatures 

cau^^ZT ? '"'^'^ ^"'^ is formed as an intermediary product dur ng S 

course of the reaction. In eulcaryotes two isozymes of aconitase are known to exist: one found in the mitochondrial 
matra and the other found in the cytoplasm. Aconitase. in «s active form, contains a 4Fe^S iron^ulfur duJer "hree 
cysteine residues have been shown to be ligands of the 4Fe^S cluster. It has been shown that the acor^HS 
tfa?bTnd?r ''T'"' ■ «'--t binding protein (IRE-BP). IRE-BP is a cS^c pSin 

that binds o iron-responsive elements (IREs). IREs are stem-loop structures found in L 5-UTR of ferritin anS delte 
aminolevulinic acid synthase mRNAs, and in the S'UTR of transferrin receptor mRNA. IRE-BP also expreS acontele 
activity. - 3-.sopropylmalate dehydratase (EC 42^33) (isopropylmalate isomerase). the enzyme ZcSalyzeMh^ 

SSir«NH pT^T"°?'''''"'" P"''"'"^ °' 'y^'"^ biosynthesis and that converts cfs-homoaconrtat^into ht 

moisocitric acid. - Esherichia coli protein ybhJ 

!r.^Turrc eCr^ l^'^^>'^(2HGSACIVM]-x.[LTV]-[GTIV].[STP)-C-x(0,1)-T-N-[GSTANI)-x(4^^^^^ [C binds the 
f^'ur ~ """''"^ «-''(2)-[LIVmPQ]-x(3HGAG]-C-(GSTAM]-ILIMPTA]-C-[LIMVl-[GA] [The two C's bind the iron-sul- 
[ 1] Gruer M.J., Artymiuk RJ., Guest J.R. Trends Biochem. Sci. 22:3-6(1997). 
42. Actins signatures 

[0233] Actins [1 to 4] are highly conserved contractile proteins that are present in all eukaryotic cells In vertebrates 
there are three groups of actin isofomis: alpha, beta and gamma. The alpha actins are found in muL e tfesues and 
are a major constituent of the contractile apparatus. The beta and gamma actins co-exists in most c Jitpi as c^ 

''"f °' «^aP« determinatbn, tip groXgra^' 

E.rh St^' T'T' '''''' '"'"^ ^ (Q-^'^"") °' « polymerized fL (fIZ 

5Trom^74 tHyT "'h' °' P^^-^^^^^^o" «=curs. the ATP is hydrolyzed. Actin is a protein 

rIcZwII I T"''- °' ^'9^^ ^o^rse of evolution 

RecenUy some divergent actin-like proteins have been identtfied in several species. These proteins are- - Centractin 

eellT« r A^-TS, Neurospora crassa ro^) and Pneumocystis carinii (act^'l). CentSn 

seems to be a component of a multi-subunit centrosomal complex involved in microtubule based vesicle motS^^ 
subfamily ,s also known as ARP1. - ARP2 subfamily which includes chicken ACTL. yeast ACT2 Drc^irta 4^^^^ 

IJd thfJ^inT/'''; ° ""'^ '^""'"^ 54 to 64 and 357 to 365. The last signature picks up both actfns 
and the actin-like proteins and corresponds to positions 106 to 118 in actins » h h 

Consensus pattem: [FY]-[LIV)-G-[DEJ-E-A-Q-x-[RKQ](2)-G- 
Consensus pattem: W-(IV]-[STA]-[RK]-x-[DE]-Y-[DNE]-[DE]- 

Consensus pattem: (LM]-(LIVM]-T-E-[GAPQ]-x-(LIVMFYWHQ]-N-(PSTAQ]-x(2)-N-[KR]- 

! J! TTl7r.^'^^^°" • {'") 3rd Edition. Academic Press Ltd. London, (1996) 

12jPollardTD..CooperJ.A.Annu. Rev. Biochem. 55:987-1036(1986). 
( 3] Pollard TD. Curr. Opin. Cell Biol. 1 :33-40(1 990). 
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[ 4] Rubenstein PA. BioEs^ys 12:309-315(1990). 

[ 5] Meagher R.B.. McLean B.G. Cell Motil, Cytoskeleton 16:164-166(1990). 

43. Adenylate kinase signature 



jans^rase) Which is legated in the mitochondrial matrix an^wh^h uses MsGW^LJ^omg^^^^^ 

Consensus pattern: (LIVI^FYW](3)-D-G-[FYI]-P-R-x(3)-(NO]- 

[ 1] Schuiz G.E. Cold Spring Harbor Symp. Quant. Biol. 52:429-439(1987) 

2 L-IJalund P.. Sanni A.. Friesen J.D.. Lacroute F. Biochem. Blophys. Res. Commun 165-464-473(1 989) 
1 4] Kath T.H.. Schmid R.. Schaefer G. Arch. Biochem. Biophys. 307:405-410(1993). -^ussu;. 
fSS ^' Short-chain dehydrogenases/reductases family signature 

known to belong to this family are listed below. - Alcohol dehydrogenase fEC 1 i iVfZl^c . T ^ ''"''^""^ 

prof— -7=-^^^^ 

hydrogenans. - Ribitol dehydrogenase (EC 1 1 1 56) fRDH^ from Ki.^IZ^T <^^^LL1^) ^^^"^ Streptomyces 
aenase rpr i 1 i m\ irr^Jv. 1.; '"^'^^^ ^ t"^*^) ^"^^^^ Klebsiella aerogenes. - Estradiol 1 7-beta^ehvdro- 

genase (bO 1.1.1.621 from human. - G uconate 5Hjehvdroaenasfi /pr i 1 1 «q\ ir^™ wtsia-uunyuro 

- 7-alpha-hydroxysteroid dehydrogenase fEC 1 i i iso^ fr^m cIk T =^i^llli§) (""DH) from mammals. 

mals. - Tropinone reductase-! (EC 1.1 1 206) and -II (PC i i i o^\ frr^rr. I •^""cidse ^tu i.i.i.i84 ) from mam- 
nase (PC 1 i 1 <r^r« n u ' \ ^ t^^-L1^123§) from plants. - N-acylmannosamine I^Jehv 

^liS.7^^'s;a:rc:^^^^^^ 

(EC 3 1 ^ Aoin^r^^^ Paucmobilis. - Cis-1.2-dihyd«,xy-3.4<yclohexadiene-1-carbo;ylate dehydZ^^^^^^ 
^^'c^-Z^^oZX^ -d Pseudomonas putida (gene xyl). - Biphen^i 

hydrogenase EC 1 3?Tflm P«. h ^ ^ Pseudomonaceae. - Cis-toluene dihydrodiol de- 

cherrcha coli (gene entA) and Bacillus subtilis (gene dhbA). - Dihydropteridine r2S (Ec^ 6 99"^7^ipw 
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of Azospinllum and Rhizobium which is probably involved in the modification of the nodulation Nod factor fatty acvl 
chain. - Nitrogen fixation protein fIxR from Bradyrhlzobium japonicum. - Bacillus subtilis protein dItE which is involvad 
in the biosynthesis of D- alanyl-llpoteichoic acid. - Human follicular variant translocation protein 1 (FVT1) - ly/louse 
adipocyte protein p27. - Mouse protein Ke 6. - Maize sex determinatfon protein TASSELSEED 2 - Sarcophaqa pere- 
grina 25 Kd development specific protein. - Drosophila fat body protein P6. - A Listeria monocytogenes hypothetical 
protein encoded in the intemalins gene region. - Escherichia coli hypothetical protein yciK. - Escherichia coli hypothet- 
ical prole^ ydfG. - Escherichia coli hypothetical protein yjgl. - Escherichia coli hypothetical protein yjgU. - Escherichia 
coll hypothetical protein yohF. - Bacillus subtilis hypothetical protein yoxD. - Bacillus subtilis hypothetical protein ywfD 
- Bacillus subtilis hypothetical protein ywfH. - Yeast hypothetical protein YIL124W. - Yeast hypothetical protein YIR035c' 
'^nlntfl'.'Trri"^' ' ^hypothetical protein YKL055c. - Fission yeast hypothetical protein 

SpAC23D3^11 . One of the best conserved regions which includes two perfectly conserved residues, a tyrosine and a 
lysine has been selected as a signature pattern for this family of proteins. The tyrosine residue participates in the 
catalytic mechanism. 

crl^oof^fo. t'-IVSPADNK]-x(12)-Y-[PSTAGNCV]-[STAGNQCIVM]-[STAGC]-K- {PC}-(SAGFYR]-rLIVM- 

STAGD]-x(2)-[LIVMFYW].x(3)- ILIVMFYWGAPTHQ].[GSACQRHM] [Y is an active site residue] - 

6j|)i°6oT30 995)^''''°" ^"^^ ^ " ^°"^^'«^-D"arte R., Jeffeiy J., Ghosh D. Biochemistry 34: 

[ 2] Villarroya A., Juan E., Egestad B., Joernvall H. Eur J. Biochem. 180:191-197(1989) 
[ 3] Persson B., Krook M.. Joernvall H. Eur. J. Biochem. 200:537-543(1991). 

[4] Neidle E.L. Hartnett C. Ornston N.L., Bairoch A., RekikM., HarayamaS. Eur J. Bkx:hem. 204:113-120(1992). 
[0237] 45. (adh_short_C2) Short-chain dehydrogenases/reductases family signature 

The short-chain dehydrogenases/reductases family (SDR) [1] is a very large family of enzymes, most of which are 
known to be NAD- or NADP-dependent oxidoreductases. As the first member of this family to be characterized was 
Drosophila alcohol dehydrogenase, this family used to be called [2.3,4]"insect-type', or ■short-chain" alcohol dehydro- 
genases. Most member of this family are proteins of about 250 to 300 amino acid residues. The proteins currently 
known to betong to this family are listed below - Alcohol dehydrogenase (EC 1.1.1.1 ) from insects such as Drosophila 

^ ^ .^^^r^^ mammals. - Acetoacetyl-CoA reductase (EC 1.1.1.36 ) from various bacterial species (gene 
phbBor phaB). - Glucose 1 dehydrogenase (EG1.1.1.47) from Bacillus. - 3-beta-hydroxysteroid dehydrogenase (EC 
IJJ^) from Comomonas testosteroni. - 20-beta-hydroxysteroid dehydrogenase (EC 1.1.1.53 ^ from Streptomvces 
hydrogenans. - Ribitol dehydrogenase (EC 1.1.1.56) (RDH) from Klebsiella aerogenes.~ES;^iol 17-beta-dehvdro- 
genase (EC 11162) from human. - Gluconate 5-dehydrogenase (EC 1.1.1.69 ) from Gluconobacter oxydans (qene 
- 3-<'xoacyl-[acylK:arrier protein] reductase (EC 1.1.1.100 ) from Escherichia coli (gene fabG) and from plants - 
Retinol dehydrogenase (EC 1.1.1.105) from mammals. - 2-deoxy-d-gluconate 3<Jehydrogenase (EC 1111 25) from 
Eschencha coli and Erwinia chrysanthemi (gene kduD). - Sorbitol-6-phosphate 2-dehydrogenase (EC 1 1 1 140) from 
rMfn ,t n?I ^?T.^.T°^ "^'^""'^"^ pneumoniae (gene sorD). - 1 5-hydroxyprostaglandin ^i^Lse 

(NAD+) (EC 1.1.1.141 ) from human. - Corticosteroid 11-beta-dehydrogenase (EC 1.1.1.146 ) (11-DH) from mammals 
^^'^y^^os^'h^s® (EC 111159) from Escherichia coli (gene hdhA). Eubacterium strain VPI 
1 2708 (gene baiA) and from Clostridium sordellii.-NADPH-dependentcarbonyl reductase (EC 1 1 1 184) from mam- 
mals. - Tropinone reductase-l (EC111206) and -II (EC 1.1.1.236 ) from plants. - N-acylmannc;i^S^i;^rr-dehydro9e- 
1 i!ifSrH^-^ T '''^^°''3<=t«"""' strain 141-8. - D^rabinitol 2-dehydrogenase (ribulose forming) (EC 
111252) from fungi. - Tetrahydroxynaphthalene reductase (EC 1.1.1.252 ) from Magnaporthe grisea. - Reridine re- 
ductase 1 EC 111253) (gene PTR1) from Leishmania. - 2.5-dichloro-2,5-cyclohexadiene-1,4-diol dehydrogenase 
I? o V"\ T Ps®"''°'"0"3s paucimobilis. - Cis-1.2-dihydroxy-3,4<yclohexadiene-1-carboxylate dehydrogenase 
1 1 1' Acinetobacter calcoaceticus (gene benD) and Pseudomonas putida (gene xylL). - Biphenyl-2 3-di- 
hydro-2,3d.ol dehydrogenase (EC 1 .3.1 .-) (gene bphB) from various Pseudomonaceae. - Cis^oluene dihydrodiol de- 
hydrogenase (EC 1.3.1.-) from Pseudomonas putida (gene todD). - Cis-benzene glycol dehydrogenase (EC 1 3 1 19) 
from Pseudomonas pulkla (gene bnzE). - 2.3Klihydro-2.3-dihydroxybenzoate dehydrogenase (EC 1 3 1 28) 7^^- 
cherichB coh (gene entA) and Bacillus subtilis (gene dhbA). - Dihydropteridine reductase (EC16997) (HDHPR) from 
mammals. - Lignin degradation enzyme ligD from Pseudomonas paucimobilis. - Agropine s^;;thi;i; reductase from 
Agrobacterium plasmids (gene masi). - Versicolorin reductase from Aspergillus parasiticus (gene VER1) - Putative 
keto-acyl reductases from Streptomyces poVketide biosynthesis operons. - A trif unctional hydratase-dehydrogenase- 
epirnerase from the peroxisomal betaK>xidation system of Candkla tropicalis. This protein contains two tandemly re- 
peated short^hain dehydrogenase-type' domain in its N-terminal extremity. - Nodulatton protein nodG from species 
of Azospinllum and Rhizobium which is probably involved in the modification of the nodulation Nod factor fatty acvl 
chain. - Nitrogen fixation protein fixR from Bradyrhlzobium japonicum. - Bacillus subtilis protein dItE which is involvaJ 
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in the biosynthesis of D- alanyl-lipoteichoic acid. - Human follicular variant translocation protein 1 (FVT1) - Mouse 
adipocsrte protein p27. - Mouse protein Ke 6. - Maize sex determination protein TASSELSEED 2. - Sa^^ophaga pere- 
grina 25 Kd development specific protein. - Drosophila fat body protein P6. - A Listeria monocytogenesTypotheUcal 
Kn'^r P r'T""' ?T ' ^''''^''^''^ hypothetical pmfein yciK. - Escherfchia conEypothet- 

Tr.^ \ : ' ^"^^^Tf T ^yP°'^«*''^' P^°»°*" yigl. - Escherichia coli hypothetical protein yjgU. - Escherichia 
coh hypothejcal proteu, yohF. - Bacillus subtilis hypothetical protein yoxD. - Bacillus subtilis hypothetical protein ywtD 
- Bacillus subtHishypothetical protein ywfH. - Yeast hypothetical protein YIL124w. - Yeast hypothetical protein YIRoSc' 

SpAC23D3.11. One of the best consen/ed regions which includes two perfectly conserved residues, a tyrosine and a 
mechaniL ^ ^'^"^'"'^ ^^'^ °* "^^^ '^"^"^ participates in the catalytic 

fT^^l^^f'fovf"^'"- I'-'VSPADNK]-x(12)-Y-[PSTAGNCV]-[STAGNQCIVM]-[STAGC]. K- {PChrSAGFYRl-fLIVM- 
STAGDl-x(2)-[U VMFYW]-x(3)- [LIVMFYWGAPTHQJ- [GSACQRHMJ [Y is arl actK^e sJe residue) '^''"''''^"^ ^'"'^'^ 

LJw-ToTsfl'g^)''''"'"" ^ " ^ ' ®°"^«'«^-D"arte R.. Jeffery J.. Ghosh D. Biochemistry 34: 

[ 2] Villarroya A., Juan E., Egestad B., Joemvall H. Eur. J. Biochem. 180:191-197(1989) 
[ 3] Persson 8., Krook M., Joemvall H. Eur. J. Biochem. 200:537-543(1991) 

[ 4] Neidle E L, Hartnett C. Ornston N.L. Bairoch A., Rekik M.. Harayama S. Eur J. Biochem. 204:113-120(1992). 

!S5?^J Zi"c<ontaining alcohol dehydrogenases signatures Alcohol dehydrogenase (EC 1 1 1 i) 

(ADH)cataVzesthereversible oxidation of ethanol to acetaldehyde with the concomitant reductionof NAD 11 CT^ 
1^ nhoM^K H ^ ^"'^ catalytically, dWerent types of alcohol dehydrogenases are known: - Zinc-containing 'long-chain' 
nit ztTn?'"^'?^^^^^^^ °' ^1^°^'°' dehydrogenases. - Iron-containing alcohol dehydroge- 

nases.Z.nc-containing ADH's [2.3] are dimeric or tetrameric enzymes that bind two atoms of zinc per subunit One of 
Ir h^^Hif" f ^""/^ ' to^'^f »3'y«i<= activity while the other is not. Both zinc atoms are coordinated by either cysteine 
or histidine residues; the catalytic zinc e coordinated by two cysteines and one histidine. Zinc<ontaining ADH's are 
hZl' h Tf ■ "f"""^"' P'^"'"' ^"^ In most species there are more than one isozyme (for example 

human have at least six isozymes, yeast have three, etc.). A number of other zincKJependent dehydrogenases are 

h Tl^s! Pr?Ti^°]; ^1 '^T ■ ^^►^y^-g^-- (ECilli) P-xylulose reductase) %oS d ! 
hydrogenase (EC J^LUi). - Aryl-alcohol dehydrogenase (EC 1.1.1.90 ^ (benzyl alcohol dehydrogenase) - Threonine 
3-dehydrogenase (EC 1^1103). - Cinnamyl^lcohol dehydrogenase (EC 1JJJ95) (CAD) [5]. CAD Is a p Jt en^me 

nr^lordTr'"" °' • -P'^^"'^^^ dehydrogen-^;^i.Vi.25i;. - Pseudomonls pS 

5-exo-alcohol dehydrogenase (EC 1.1.1.-) [6]. - Escherichia coli stan^ation sensin^^ rspB. - Escherichte coli 

hypohe,ca protemyjgB.-Escher«hiacolihypotheticalproteinyjgV.-Escherichiacoiihypotheti(^lproteinyiiN -Yeast 
7cR oSrlhT^: ^t^: 'l^''"!!' ■ ^^^^^ (P^NSoT- Yeas. hypotheSal pr^Jn 

J^ZTl Jl^T ; Z "^^"^^^"^ *° "^^'^^ °' ^"^y^^^ ''^^^d on a conserved region that 

T^l^ . residue Which IS the second ligand of the catalytic zinc atom. This family also includes NADP- 

dependenl quinone oxidoreductase (ECi6JJ),an enzyme found in bacteria (gene qor). in yeast and in mammals 
Where, in some species such as rcxients. has been recrurted as an eye lens protein and is kJiown S zI^^L 
Sh Tr""^ o quinone oxidoreductase is distantly related to that other zinc^ontaining alcohol dehvdrogenases 

andrtlacksthezinc-hgand residues. The torpedo fish and mammlian synaptic vesiclemembrane protein vat-lisreiated 
to qor. A specific pattern has been developed for this subfamily 

Consensus pattern: G-H-E-x(2)-G-x(5)-[GA]-x(2)-[IVSAC) [H is a zinc ligand] 

Consensus pattern: [GSD]-[DEQH]-x(2)-L-x(3)-[SA](2)-G-G-x-G-x(4)-Q-x(2)-(KRJ- 

[1]Branderi C.-l.. Joernvall H.. EkIundH., Furugren B. (In) The Enzymes (3rd edition) 11:104-190(1975) 
[ 2] Joemvall H., Persson B.. Jeffery J. Eur. J. Biochem. 167:195-201 (1 987) 
[ 3] Sun H.-W.. Plapp B.V. J. Mol. Evol. 34:522-535(1992) 

Ler32Jri4%?3r" ' ' "^^"-"^^S^^^^' B- Keraenen S.. Penttilae M.. Joernvall H. FEBS 

[ 5) Knight M.E.. Halpin C. Schuch W. Plant Mol. Biol. 19:793-801(1992) 

[ 6] Koga H., Aramaki H., Yamaguchi E.. Takeuchi K.. Horiuchi T, Gunsalus I.e. J. Bacteriol 166-1089-1095(1986) 
2T2^("993r'°" ^'^ ^ ■ 

[0239] 47. (aldedh) Aldehyde dehydrogenases active sites 

[0240] Aldehyde dehydrogenases (ECUJJ and EC 12^5) are enzymes which oxidize a wide variety of aliphatic 
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and aromatic aldehydes. In mammals at least four different forms of the enzyme are known (11: class-1 (or Aid C) a 
tetramerc cytosolic enzyme, class-2 (or Aid M) a tetrameric mitochondrial enzyme, class-3 (or Aid D) a dimeric cytosolic 
enzyme and class IV a microsomal enzyme. Aldehyde dehydrogenases have also been sequenced from fungal and 
bacterial species. A number of enzymes are known to be evolutionary related to aldehyde dehydrogenases- these 
enzymes are listed betow. - Plants and bacterial betaine-aldehyde dehydrogenase (EC 1.2.1.8 ) [2], an enzyr^e that 
catalyzes the last step in the biosynthesis of betaine.- Plants and bacterial NADP-dependli^Jyceraldehyde-Snjhos- 
phate dehydrogenase (EC 1.2.1.9). - Escherichia coli succinate-semialdehyde dehydrogenase (NADP+) (EC 1 2 1 1 6) 
c^f f ™ ^' "^"^ '^""''^^ ^"""'"^'^ semialdehyde into succinate. - Escherichia coli lactaldehyde dehydr^i^e 
(b0 1.2.1.22 ) (geneald) [4]. - Mammalian succinate semialdehyde dehydrogenase (NAD+) (EC 12124V - Escherichia 
coh phenylacetaldehyde dehydrogenase (EC 1.2.1 .39). - Escherichia coli 5^rboxymethyl-2-hi;d?^uconate sem- 
ialdehyde dehydrogenase (gene hpcC). - Pseudomonas putida 2-hydroxymuconk: semialdehyde dehydrogenase 151 
(genes dmpC and xylG), an enzyme in the meta-cleavage pathway for the degradation of phenols, cresols and catechol 

- Bacterial and mammalian methylmalonate-semlaldehyde dehydrogenase (MMSDH) (EC 12127) [61 an enzvme 
iTi fo^I^/^ '^"ifl.^f^r^ °* ^^""^ catabolism. - Yeast delta-l-pyrroline-S-carboxybiT^ydrc^enase (EC 
1JJJ2) [7] (gene PUT2), which converts proline to glutamate. - Bacterial multifunctonal putA protein which contains 
a delta-1-pyrroline- 5-carboxylate dehydrogenase domain. - 26G. a garden pea protein of unknown function which is 
induced by dehydration of shoots [8]. - Mammalian formyltetrahydrofolate dehydrogenase (EC 1 5 1 6) [91 This is a 
cytosolic enzyme responsible for the NADP-dependent decarboxylatKre reduction of 10-formyltet^h;;drotolate intotet- 
rahydrofolate. It is an protein of about 900 amino acids which consist of three domains; the C- terminal domain (480 
residues) is structurally and functionally related to aldehyde dehydrogenases. - Yeast hypothetical protein YBROOBw 

- Yeast hypothetical protein YER073w. - Yeast hypothetical protein YHR039c. - Caenorhabditis elegans hypothetKal 

^iTTl l ^ t ^ ^'''^ ^"^ ^ "^^^^^^ '^'''"^ ^"^^ "^^^ implicated in the catalytic activity of mammalian 

aldehyde dehydrogenase. These residues are consented in all the enzymes of this family. Two patterns have been 
derived for this family, one for each of the active site residues. 

Consensus pattern: [LIVMFGA]-E-[LIMSTAC]-[GS)-G-[KNLMHSADN1-[TAPFV] (E is the active site residuel- 
Consensus pattern: [FYLVA]-x(3)-G-[QEJ-x-C-[LIVMGSTANC]-(AGCN]-x-[GSTADNEKR] [C is the actK/e site residue 

[ 1] Hempel J., Harper K., Lindahl R. Biochemistry 28:1160-1167(1989). 

[ 2] WeretilnykE.A., Hanson A.D. Proc. Natl. Acad. Sci. U.S.A. 87:2745-2749(1990). 

[ 3] Niegemann E.. Schuiz A., Bartsch K. Arch. Microbiol. 160:454-460(1993). 

[ 4] HidalgoE., Chen Y-M.. Lin E.C.C., Agullar J. J. Bacteriol. 173:6118-6123(1991). 

[ 5] Nordlund I., Shingler V. Bkxihim. Biophys. Acta 1049:227-230(1990). 

[ 6] Steele M.I.. Lorenz D., Hatter K.. Park A., Sokatch J.R. J. Brol. Chem. 267:13585-13592(1992) 
[ 7] Krzywicki K.A., Brandriss M.C. Mol. Cell. Biol. 4:2837-2842(1984). 
( 8] Guerrero FD., Jones J.T. Mullet J.E. Plant Mol. Biol. 15:11-26(1990). 
[ 9] Cook R.J.. Lloyd R.S., Wagner C. J. Biol. Chem. 266:4965-4973(1991). 

[0241] 48. Aldo/keto reductase family signatures 

The aldo-keto reductase family [1 ,2] groups together a number of structurally and f unctfonally related NADPH-depend- 
ent oxidoreductases as well as some other proteins. The proteins known to belong to this family are: - Aldehyde re- 
ductase (EC JJJ^). -Aldose reductase (EC 11121 ). -3-alpha-hydroxysterokd dehydrogenase (EC 1 1 1 50) which 
terniinates androgen action by converting 5-alpha^ihydrotestosterone to 3-alpha-androstanediol. - P?^andin F 
synthase (EC 11.1.188 ) which catalyzes the reduction of prostaglandins H2 and D2 to F2-alpha. - D-sorbitol-6-phos- 
phate dehydrogenase (EC 111200) from apple. - Morphine 6-dehydrogenase (EC 1.1.1.218) from Pseudomonas 
putida plasmid pMDH7.2 (gene morA). - Chlordecone reductase (EC 1.1.1.225 ) which reduces the pesticide chb- 
rdecone (kepone) to the corresponding alcohol. - 2,5-diketo-D-gluconic acW reductase (EC 1.1.1 -) which catalyzes 
I'Sofu''!! °' 2.5^iketogluconic acid to 2-keto^.-gulonic acid, a key intermediate in the production of ascorbic acid 
- NAD(P)H-dependent xylose reductase (EC 1.1.1.-) from the yeast Pichia stipitis. This enzyme reduces xylose into 

'^shydrogenase (EC 1.3.1.20 1. - 3-oxo-5-beta-steroid 4-dehydrogenase (EC 
13^) which catalyzes the reduction of delta(4)-3K)xosteroids. - A soybean reductase, which co-acts with chalcone 
synthase in the formation of 4.2'.4"-trihydroxychalcone. - Frog eye lens rho crystallin. - Yeast GCY protein whose 
function IS not known. - Leishmania major PllO/11 E protein. P110/11E is a developmentalV regulated protein whose 
abundance is markedly elevated in promastigotes compared with amastigotes. Its exact function is not yet known - 
vSfrf T P~'«'" - Escherichia coli hypothetical protein yghE. - Yeast hypothetical protein 

YBR149W. - Yeast hypothetical protein YHR104w - Yeast hypothetical protein YJR096wThese proteins have all about 
300 amino acid residues. Three consensus patterns have been developed that are specific to this family of proteins 
The jrst one IS kjcated in the N-temilnal section of these proteins. The second pattem is located in the central section' 
The third pattem, located in the C-temiinal. is centered on a fysine residue whose chemical modiflcatton, in aldose and 
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aldehydereductases, affect the catalytic efficiency. 

Consensus pattern: C3-[FY>R-{HSAL]-[UVI\4fT-D-[STAGC]-[AS]-x(5)-E-x(2)-[LIVIvn- G- 
Consensus pattern: (LIVMFY]-x(9)-[KREQ]-x-(LIVIVI].G-[LIVMl-[SC]-N-[FY]- 

r?sZr' "'"'"^ l'-IVM]-[PAIV]-[KRl.[ST]-x(4)-R-x(2)-(GSTAEQKHNSL]-x(2HLIVI^FA] [K is a putative active site 

[ 1] Bohren K M.. Bullock B.. Wermuth B.. Gabbay K.H. J. Biol. Chem. 264:9547-9551(1989) 
[ 2] Bruce N.C.. Willey D.L.. Coulson A.FW., Jeffeiy J. Biochem. J. 299:805-811(1994). 

Sed^LTplT'^f ■• I'*' '^T. " '^"y °' ^« S'y'^^^y' ^y<^^^^^^^- The structure is an 8 

h^i ? H K • '"'^""P'^ « -70 a.a. calcium-binding domain protruding between beta strand 3 and 
alpha helDc 3, and a carboxyl-temiinal Greek key beta-barrel domain en oeia sirana j and 

[ili^I^°c„^^' A. Cascra D, Day J, McPherson A. J Mol Biol 1994;235:1560-1584 

IUZ43] 50. Aminotransferases class-l pyridoxal-phosphate attachment site 

Aminotransferases share certain mechanistic features with other pyridoxal- phosphate dependent enzymes such as 
the covalent b,nd.ng of the pyridoxal- phosphate group to a lysine residue. 6n the basis of s^uele SitJ the^^^^ 
vanous enzymes can be grouped (1 .2] into subfamilies. One of these, called class-l. current JXrs ofTe fSS 

fZrasoartSr; ^'Trfrr ^'^'^ "^^^^'^ ---^-^ tranX S he Li o gip' 

^■^ZTm k *° '^^-^ oxatoacetate and L-glutamate. In eukaryotes, there are two AAT isozymes 

ZL^r\ T cytoplasmic. In prokaryotes, only one form oVAAT?sZd 

T^Zi oiT2°^^^^^ 4-hydroxyphenylpyruvate and L-glutamate. - Arc^na^'a^^ 

notransterase (EC 2,6J^) involved in the synthesis of Phe, Tyr, Asp and Leu (gene tyrB) - 1 -aminocvcloDrooane 
blitS ' % T (EC££114) (ACC synthase) from plants. ACC synthase catalyze theS s^ nThy^e 
S^n Y iTnfi; rt denitnficans cobC. which is involved in cobalamin biosynthesis. - Yeast hypott^S 

cre^irart^c^^sre^^^ 

SrPaSrt^sT 

[ 1] Bairoch A. Unpublished observations (1992) 

k''S ch"JTS7™",: ""^^ """" ^ 

[0244] 51 . Aminotransferases class-ll pyridoxal-phosphate attachment site 

Aminotransferases share certain mechanistic features with other pyridoxal- phosphate dependent enzymes such as 
the covalent binding of the pyridoxal- phosphate group to a fysine residue. the basis of sequele sSr^ese 
various enzymes can be grouped [1] into subfamilies. One of these, called class-ll. currently cons^^ts oHhe^SowInn 
pTno'.' IT ^'^T""''"^' (EC 2Ji29). Which catalyzes the addition ^iSa o g^^^^^^^^^ 
2-amino-3-oxobutanoate (gene kbi). - 5-aminolevulinic acid synthase (EC 2.3 137 ) (delta-ALA sv^hase^ wh S, 
alyzes the first step in heme biosynthesis via the Shemin (or 64) pathla,Trii^ric2ttfonCf^uc?yr^^^^^^ 
to form 5- aminolevulinate. - 8-amino-7oxononanoate synthase (EC 2.3.1.47) (7-KAP synthetase) a bad^riar«nz^Z 

toSnilT! r^"' '"'^""""^'^ '''' biosynthesis^ .he^i ro K^C^^^^^^^^^^^ 

to alanir^e to fom, 8-am,no-7-oxononanoate. - Histidinol-phosphate aminotransferase (EC 2 6 1 9?^ich Slv^t 

the eighth step in histidine biosynthetic pathway: the transfer of an amino group from 3-(i,^Mi g^^o^^^^^^^^ 

phosphate toglutamicacidtoformhistidinol phosphate and2^xoglutarate.-lerinepalmrtoy^^^^ 

jom yeas, (genes LCB1 and LCB2). which catalyzes the condensation of palmitoyl^oSsel^L^^^^^^^^ 

=0^ rratrora r er-^^^^ ~ — - 

Consensus pattern: T-[LIVMFYVV]-[STAG]-K-(SAG]-{LIVMFY)WRHSAGl-x(2)-[SAGl 
[K IS the pyridoxal-P attachment site]- ' 

[ 1] Bairoch A. Unpublished obsen/ations (1991). 

[0245] 52. Aminotransferases class-l II pyridoxal-phosphate attachment site 

Aminotransferases share certain mechanistic features with other pyridoxal- phosphate dependent enzymes such as 

llriSrJn^^ ' h' ''""T ^ "y^'"^ °" Of sequence Srt^ mese 

vanous enzymes can be grouped[1.2] into subfamilies. One of these, called class-Ill, currently consists (^nSow^nn 

irZe;oit:?Tr"°''^^'^^^^^ 

lomrthine to alpha-ketoglutarate. yieWing N^cetyl-glutamic-5-semi-aldehyde and glutamic acid. OmithL 2- 
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notransferase (EC g6AJ3). which catalyzes the transfer of an amino group from ornithine to alpha-ketoglutarate 
yieldrng gimam.c-5- semi-aldehyde and glutamic acid. - Omega-aminoacid-pyruvate aminotransferase (EC 26118)" 
which catalyzes t»ansamination between a variety of omega-amino acids, mono- and diamines, and pyruvatTlTi^vs 
a pivotal role m omega amino acids metabolism. - 4-aminobutyrate aminotransferase (EC 2.6.1 19 ) (GABA transami- 
d^hSiJtn T"^"" 'Zf' °' 9^°"" ^^^^ '° alpha-ketogiutarate.-^g succinate seS 
TS^rZ J^T T: °'^\'^'^^^^^^^ (ECML62). a bacterial enzyme (gene bioA) which catalyze 
an intermediate step in the biosynthesis of biotin. the transamination of 7-keto^-aminopelargonic acid (7-KAP) to form 
7.8- d.aminopelargon«: acid (DAPA). - 2.2-dialkylglycine decarboxylase (EC 4.1.1.64) a pVeudomonas^pS en 

alan ne and cartoon dioxide. - Glutamate-1 -semialdehyde aminotransferase (EC 5.4.3.8) (GSA) GSA is the enzyme 
irivolved in the second step of porphyrin biosynthesis, via the C5 pathway. It transf^e amino group on carbon 2 of 

Sfn^rh^^l ' ^ aminotransferase yodT. - Haemophilus influenzae aminotransferase HI0949 - 

S Zlc f ^ aminotransferase T0lB11.2.The sequence around the pyridoxal-phosphate attachment site 
of this class ofenzyme IS sufficiently consented to allow the creation of a specific pattern 

Consensus pattern: [LIViy/IFYWC](2)-x-D-E-[IVA]-x(2)-G-[LIVIMFAGC]-x(0.1)-[RSACUl-x-fGSADl-xf12 16)-n n ivm 

FCHLIVMFYSTA].x(2)- [GSA]-K-x(3)-[GSTADNV]-[GSAC] [K is the pyridoxal-P attachrieifsM ' 

(199^"°^ Unpublished observations (1992).[ 2] Yonaha K.. Nishie M.. Aibara S. J. Biol. Chem. 267:12506-12510 

^^^IJt^) '7TJ^Ti^ separation between noise and signal on the HMM search Ankyrin repeats 

[1] A, HolakTA, FEBS Lett 1997;401:127-132. 

[2] Lux SE, John KM, Bennett V, Nature 1990;345:736-739. 

[0247] 54. Aminotransferases class-IV signature 

[0248] Aminotransferases share certain mechanistic features with other pyridoxal-phosphate dependent enzymes 
such as the covalent binding of the pyridoxal-phosphate group to a lysine residue. On the basis of sequence sSlarS' 
these vanous enzymes can be grouped [1.2] into subfamilies. One of these, called class-IV, current^, consists of the 

Toi lowing enzyrnes: 

- Branched-chain amino-acid aminotransferase (EC 2.6. 1.42) (transaminase B). a bacterial (gene ilvE) and eukarv- 
otc enzyme which catalyzes the reversible transfer of an amino group from 4-methyl-2-oxopentan6ate to gluta- 
mate. to fomn leucine and 2-oxoglutarate. t^yiuia 

- D^alanine aminotransferase (EC 2,6^21). A bacterial enzyme which catalyzes the transfer of the amino group 
from D-alanine (and other D-amino acids) to 2-oxoglutarate, to form pyojvate and D-aspartate 

■ z^e' (PAVAr^fp°ylC ^''^''^ '''''' ^ ^"^"^^ '^^"^^^^ 

[0249] JhB above enzymes are proteins of about 270 to 41 5 amino-acid residues that share a few regfons of sequence 
similanty. Surpns.ngly. the best-consen,ed region does not include the lysine residue to which the pyridoxal-phosphate 

residues at the C-terminus side of the PIP-lysine ^^oumow 
Consensus pattern: E-x-[STAGCI]-x(2)-N-[LIVMFAC]-[FY]-x(6,12)-[LIVIVIF]-x-T-x(6.8)-[LIVMJ-x-[GSJ-[LIVM]-x-[KR]- 

[1] Green J.M.. Merkel W.K., Nichols B.P J. Bacteriol. 174:5317-5323(1992). 
[2] Bairoch A. Unpublished obsen/ations (1 992). 

[0250] 55. Aminotransferases class-V pyridoxal-phosphate attachment site 

Aminotransferases share certain mechanistic features with other pyridoxal- phosphate dependent enzymes such as 
the covalent binding of the pyridoxal- phosphate group to a lysine reskiue. 6n the basis of sequence Jiese 

nf h " ^^^^P^"^""^ aminotransferase (EC2.6.1.52). an enzyme which catalyzes the reversible interconverskin 
nLS^SJfTT^ and 2K,xoglutarate to 3-phosphonooxypyruvate and glutamate. It is required both in the major phos- 
siml^i^f/lhT' H ""T' Py"d°'<'"« biosynthesis. The bacterial enzyme (gene serC) is hYg^fy 

f^Ie 131 ^t^;;;*''"^^^^^ P'oteln (EPIP). which is probably a phospfSserine Jhnoti^s^ 

ferase (3). - Serine-gVoxylate aminotransferase (EC 2.6.1.45) (SGAT) (gene sgaA) from Methylobacterium ex- 
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torquens. - Serrne-^yruvate aminotransferase (EC 2.6.1.51). This enzyme also acts as an alanine-glyoxylate am\- 
eo ' IFT^- " ♦•^^ P«^°>"«^e3 and/or mHochondr^. :'lsop^^dlirN 

rrii?-^' ^" '"""""^ biosynthesis of cephalosporin antibiotfcs and caLyzes the 

rrfLH "f T " ^"^ ■ ^ P™'«'" °' '^^^ ""^°9«" fiction operonS some 

S NFS1 rsPLif exact function of nifS is not yet Known. A high^ similar protein has been found In fungi 
(gene NFS1 or SPL1 ). - The small subunrt of cyanobacterial soluble hydrogenase (EC 1.12.-.-). - Hypothetical proteX 

attachment srte of this class of enzyme is sufficiently consen/ed to allow the creation of a specific pattern 

x-[LIVMFYSAC][K IS the pyridoxal-P attachment sits]- > \ f i >i 

1 1] Ouzounis C, Sander C. FEBS Lett. 322:159-164(1993). 
[ 2] Bairoch A. Unpublished observations (1992). 

[ 3] van der Zel A.. Lam H.-M., Winkler IVI.E. Nucleic Acids Res. 17:8379-8379(1989). 
[0251] 56. Annexins repeated domain signature 

l"Zilii Vw,^^ ^r"" °' calcium-binding proteins that associate reversibly with membranes. They bind to 
phospholipd bilayers rn the presence of micromolar free calcium concentration. The binding is specific for calcium and 

Liti^" i^rS ^^'T ^"^^ ""^^'""^ *° '"^^'^"'^ cytoskeletal interactions, phospholipase inhi- 

bition int acelluter signalling, anticoagulation, and membrane fusion. Each of these proteins con^st of an N^erminal 
?srr T " '^"Sth 'ollowed by four or eight copies of a consented segment of sbcty one residuerThe ep a 
gometimes known as an "endonexin fold) consists of fK,e alpha-helices that are wound into a right-handed superhelS 
ShIJmS'in c "T n '° '^'""y '''^'^ "^'"'^^ - ' C-iPocortin 1) (Calpactin 2) (p35) 

3??PAP m A ^" ' .?/T ^''^'P^'^"" (Chromobindin 8). - Annexin III (Lipoc^rtln 

fl oT/Sr' '^"^f 'V ("-ipocortin 4) (Endonexin I) (Protein II) (Chromobindin 4). - Annexin V (LIpocortin 5) ^ndon- 

hf o ^ ; ' ""'^ ■ ^' e-'P^"'* I") (Chromobindir! 2^) (p68) (p70) T^^ 

IS he only known annexin that contains 8 (instead of 4) repeats. - Annexin VII (Synexin). - Annexin V I (V&scuter 
anticoagu^nt-beta) (VAC-beta). - Annexin IX from Drosophila. - Annexin X from 6riophiL - ZZ^ X cS n 
associated annexin) (CAP-50). - Annexin XII from Hydra vulgaris. - Annexin XIII (Intestine^pecific annLh) Sa) The 

SinineTpoS^^^^^^^ 

[ti^^ VS VMF]- ^^^^^^ x(7).[LIVMF]-x(3)-(LIVMF]-x(11). 
[ 1] Raynal R. Pollard H.B. Biochim. Biophys. Acta 1197:63-93(1994). 

[2] Barton G.J.. Newman R.H., Freemont P.S., Crumpton M.J. Eur. J. Biochem. 198:749-760(1991) 
[3]BurgoyneR.D.,GeisowM.J. Cell Cateium 10:1-10(1989). 

[ 4] Haigler H.T. Fitch J.M., Jones J.M.. Schlaepfer D.D. Trends Biochem. Sci. 14:48-50(1989) 

[ 5] Klee C. B. Biochemistiy 27:6645-6653(1 988). 

[ 6] Smith P.D., Moss S.E. Trends Genet. 10:241-246(1994). 

[ 7] Huber R.. Roemisch J., Paques E.-R EMBO J. 9:3867-3874(1990). 

[ 8] Fiedler K., Simons K. Trends Biochem. Sci. 20:177-178(1995). 

[0252] 57. (arf_1 ) ADP-ribosylatbn factors family signature 

ulate vesicle budding and uncoating within the Golgi apparatus. ARFs also act as allosteric activato^ of cholera to^n 
mP^^or^Tr"'^"" ^T'- 1?' At least six fo^s ^ 

but which lack the cholera toxin cofactor activity, they are collectively known as ARUs (ARF-like) ARD1 is a 64 Kd 

IZZTpET^ ^ °' ""^""^ '^"'^'"^ ^ ARF domain at rts C-tem,inal extremity. Proteins 

from the ARF amHy are generally included in the RAS "superfamily- of small GTP-binding proteins (5], but they are 
Sr'f ^^^^^^^^^ RAS proteins. They also differ from RAS proteins in that L'y bck c^slei' e resLe 

at hei C-termini and are therefore not subject to prenylation. The ARFs are N-femiinally myristoylated (the ARLs have 
not yet been shown to be modified in such a fashion). A consented region in the C-tenrninal part of ARF's and A^Us 
has been selected as a signature pattern. 

Consensus pattern: [HRQT]-x-(FYWIJ-x-[LIVMJ-x(4)-A-x(2)-G.x(2)-{LIVM]-x(2)-fGSA]-[LIVMn-x-[W^ 
Note, proteins belongingtothisfamilyalsocontainacopy of theATP/GTP-bindingmotif-A-(P-lU)(see^ 
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( 1] Boman A.L., Kahn R.A. Trends Biochem. Sci. a):147-150(1995). 

( 2] Moss J. . \feughan M. Cell. Signal. 4.367-399(1 993). 

[ 3] Moss J., V&ughan M. Prog. Nucleic Acid Res. Mol. Biol. 45:47-65(1993) 

[4] Amor J.C., Harrison D.H.. Kahn R.A.. Ringe D. Nature 372:704-708(1994) 

[5J \felencia A.. Chardin P.. Wittinghofer A.. Sander C. Biochemistry 30:4637-4648(1991). 

[0253] (arf_2) ATP/GTP-binding site motif A (P-loop) 

From sequence comparisons and crystallographic data analysis it has been shown [1.2.3.4.5.61 that an appreciable 
proportion o proteins that bind ATP or GTP share a number of more or less conseLd sequeL merits T^^ belt 

mfe^ilt J^' " °' P'^^P'"*^ groups Of the nucleotide. This sequence motif is generaTy 

^r,hl pt!l , "Tr" r"'"'" ■"'■"^P' t'l "^^^^ ""'^^^^"^ '^TP. or GTP-binding pLins in 

, i P-loop 's found. A number of protein families for which the relevance of the presence of such moti?has been 

<?DOCmoLp P'^'^'"^ <PDOC00343». - Dynamins and dynamin-like proteins (see 

<PDO200362>^ - Guanylate kinase (see <PDOC00670». - Thymidine kinase (see <PDgC00524» Thymidylate 

S'<Vrcio^r^;V '""V^^^ <PPOC00868». - NitrogLse^?;;7;=-,amir(n2^^ 

nwrS^Srr ^' " ro ?^r^ "'"'^'"^ transporters) [7] (see ^DOC00185» 

1^.^^ T TV^'- ^"^^ ^P"- - Nuclear protein ran (see <PDobg0859» - aSp 

Ss fGiTs Gt G O "etc ? hmT ' .'"J'"' <PDOC00539 ». - Guanine nucleotide-binding proteins alpha sub- 
unrts (Gi. Gs, Gt. GO. etc.). - DNA mismatch repair proteins mutS family (See <PDOC0038B>) - Bacterial tvoe II 
IZTJ'T ' f ' <PDOC00567».Not all ATP- or GTP-binding pro teins are picke d-upyth loSf A 

of ,1 P fJ f'"' T^""^ °' ATP-binding srte is complete^ dmeJent from that 
proteins the flexible loop exists in a slightly different fomi; this is the case for tubulins or protein kinases A spS 

prrGTytrrrrr^^^^^^ 

Consensus pattern: [AG]-x(4)-G-K-[ST]- 

[ IJWalker J.E.. SarasteM.. Runswick M.J., Gay N.J. EMBOJ 1-945-951(1982) 
[ 2] Moller W., Amons R. FEBS Lett. 186:1-7(1985). 

[ 3] Fry D.C., Kuby S.A.. Mildvan A.S. Proc. Natl. Acad. Sci. U S A 83 907-911 (1986) 

[ 4] Dever TE., GVnias M.J.. Merrick W.C. Proc. Natl. Acad. Sci. U.S.A. 841814-1818(1987). 

[ 5] Saraste M.. Sibbald PR., Wittinghofer A. Trends Biochem. Sci 15 430-434(1990) 

[ 6] Koonin E.V. J. Mol. Biol. 229:1165-1174(1993) 

57l-592l?990)''" ^ ' ' ° " ' ""^"^S^"^ ^'°«"^^9- ^i^^^^^br. 22: 

[ 8] Hodgman TC. Nature 333:22-23(1988) and Nature 333:578-578(1988) (Errata) 

[llm^^' ^" ^^""'"^^ ^ - P-J- Nishi K.. Schnier J., Slonimski PR Nature337:121-122 

[10] Gorbalenya A.E.. Koonin E.V.. Donchenko A.P, Blinov V.M. Nucleic Acids Res. 17:4713-4730(1989). 
[0254] 58. Arginase family signatures 

The foltowing enzymes have been shown [1] to be evolutionary related: - Arginase (EC 3.5.3.1). a ubiquitous enzvma 
rjlT the degradation of arginine to omrthine and urea [2]. - Agmatinase (E5i:?i^i) agrJl: u 2hy 
FoS' (9«"« «P°B) that catalyzes the hydrolysis of agmatine int^^ine and urea 

Formiminoglutamase (ECMli) (fomiiminoglutamate hydrolase), a prokaryotic enzyme (gene hutG) that hydrolyzes 

^rZm? ■ P^°»«'"= from metLogenic arletecS 

Tl^eseenzymesareproteins of about 300 amino^cid residues. Threeconse,vedregionsthatc<;rtain charged res^ 
Which are involved in the binding of the two manganese bns (3J can be used as signature patterns - 

Consensus pattern: [LIVMF]-G-G-x-H-x-[LIVMT]-[STAV]-x-[PAG]-x(3)-[GSTA) [H binds manganesej- 
Consensus pattern: [LiVM](2)-x-[LIVMFY)-D-[ASJ-H-x-D [The two D's and the H bind manganese]- 
Consensus pattem: [STl-(LIVMFY]-D-(LIVM]-D-x(3)-[PAQ]-x(3)-P-[GSA]-x(7)-G [The two D's bind manganese] 

[ 1] Ouzounis C. Kyrpides N.C. J. Mol. Evol. 39:101-104(1994). 

[ 2] Jenkinson CP. Grody W.W., Cederbaum S.D. Comp. Bkichem. Physiol. 114B:107-132(196). 
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1 3] Kanyo Z.F., Scolnick LR.. Ash D.E., Christiansen D.W. Nature 383:554-557(1996). 
[02S5] 59. (asp) Eukaryotic and viral aspartyl proteases active site 

Aspartyi proteases also known as acid proteases. (EC 3.4.23.-) are a widely distributed family of proteolytic enzymes 
otesl mTnol '"^^^^''^f t; '""9'- P'^'^" '^'^^^'^^^^ «nd some plant viruses. Aspartate proteases of eS^! 

dfrdS^l r ^kT "^'^ °' ^" Seno encoding a primor- 

oLitS^n S Tk r ^'P^'^ P"'*^^"^ ^^"^ - ^^^''^^^^ 9««»ric pepsins i and C (also known as 

gastncsin). - Vertebrate chymosin (rennin). involved in digestion and used for making cheese. - Vertebrate lysoslal 
So lnl^ f and E (EC 3A2334). - Mammalian renin (EC 3.4.23 15> whose function Ttoge^e^ 

angiotensin I from ang.otensinogen .n the plasma. - Fungal proteases such as asperglllopepsin A (EC 34 23 im 
candidapepjn (EC M2a24), mucoropepsin (EC 3.4.23.23 ^ (mucor rennin). endoth^pepsln (E^Ts^Sh^ 
ropepsm (ECM|129). and rh^opuspepsin (EC 3.4.23.21 1 - Yeast saccharopepsin (EC MiiSlSeinTse^ 
K?/.sw P^il'trP"'^'^^ Posttranslational regulation of vacuolar hydrolases. - Wrf^Tepsin (EC 

H " °' '"^♦'"9 Pheromones. Most retroviruses and s^me 

?25 Im nn ' r^'^ "adnavnjses, encode for anaspartyl protease which is an homodimer of a chain of aboutS to 
the ^tn^r "^V,t '""'^'^ ^ ^^S"^^"' °' apo^protein which is cleaved during 

Site of the viral proteases allows us to develop a single signature pattern for both groups of protease 

GxiuSi^th''"?^ t^'V^'''3ACHUVMTADN]-[LIVFSA]-D-|ST]-G-[STAV]-[STAPDENQ]-x-[LIVMFSTNC]-x^^^^ 
GTA] (D IS the active site residue] Note: these proteins belong to families A1 and A2 in the ciLification of peptidases 

[ 1] Foltmann B. Essays Biochem. 17:52-84(1981). 

[ 2] Davjes D.R. Annu. Rev. Biophys. Chem. 19:189-215(1990). 

[ 3] Rao J.K.M., Ericksc.i J.W., Wlodawer A. Biochemistry 30:4663-4671(1991) 

[ 4J Rawlings N.D., Barrett A. J. Meth. Enzymol. 248:105-120(1 995). 

[0256] 60. (BIRA) Biotin repressor 

K'^eS^^^^ ^' ^^'^ "^"^^ 1992:89:9257-9261. 

(zf-C2H2) proteins and in proteins that contain the Kelch motif 

instances heteromenc dimensation [2J.The structure of the dimerised PLZF BTB/POZ domain has been so Jed^d 
onsists Of a ^ghtly intertwined homodimer. The cential scaffolding of the protein is made up of a cLsl^o^alpha 
helices flanked by short beta-sheets at both the top and bottom of the molecule [3]. POZ domains from several zhic 
defclCJ" ' ''T *° """^'^ .ranscriptional repression and to iitLct with compon j^t of StS^^ 
BR-?mk or 1°^'^'''"°' '"'^'"'^'"^ ^"'^ SMRT [4.5.6J. The POZ or BTB domain Is also known^s 

[1] Zollman S. Godt D. Prive GG. Couderc JL. Laski FA; Proc Natl Acad Sci U S A 1994-91-10717-10721 
[2]Bardwell VJ. Treisman R; Genes Dev 1994;8:1664-1677. »«'»,»i.iu/i^ lozzi. 

[3] Ahmad KF. Engel CK. PrIve GG; Proc Natl Acad Sci U S A 1998 95 12123-12128 

DmSr6:MS3°' ^' °' '■^P""'^" °= ^^^^'^ 

[5] Huynh KD, Bardwell VJ; Oncogene 1998;17:2473-2484. 

[6] Wong CW. Privalsky ML; J Biol Chem 1998;273:27695-27702. 

[0258] 62. (Bac GSPproteins) Bacterial type II secretion system protein D signature 

A number of bacterial proteins, some of which are involved in a general secretion pathway (GSP) for the export of 
prote^s (also called the type II pathway) [1 to 5]. have been found to be evolutiona^r mlated. fiese proteins a'eTld 
Vh R KibsllT" °T <9ene exeD); Erwinia ^gane outD); Es^hChia X' ne 

SomSLlSrn!rT' ''^^"^^^^^ ^^"^9"°^ (gene xcpQ); Vlbrb cholerae (gene epsD) and 

Xanthomonas campestns (gene xpsD). - comE from Haemophilus influenzae, involved in competence (DNA uptake) 
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^^tT°!!^^ aeruginosa, which is essential for the formation of the pili. - hofQ (hopQ) from Escherichia 

[ 1] Salmond G.RC. Reeves RJ. Trends Biochem. Sci. 18-7-12(1993) 

!!l Ifl? m";.""'"" ' ^- J.S. Mol, Microbiol. 9:857-868(1993) 

[ 4] Hobbs M., Mattick J.S. Mol. Microbiol. 10:233-243(1 993). 
[ 5] Genin S., Boucher C.A. Mol. Gen. Genet. 243:112-118(1994). 

[02S9] 63. (Bac globin) Protozoan/cyanobacterial globins signature 

Consensus pattern: F-[LF]-x(5)-G-[PA]-x(4)-G-[KRA]-x-[LIVM]-x(3)-H- 

[ 1] Concise Encyclopedia Biochemistry. Second Edition, Walter de Gruyter. Berlin New-York M 9881 
[ 2] Takagi T. Curr. Opin. Struct. Biol. 3:41 3-41 8(1 993) ^ ^• 

[ 3] Couture M., Chamberland H., St-Pierre B.. Lafontaine J.. Guertin M.; Mol. Gen. Genet. 243:185-197(1994). 
[0260] 64. Band 7 protein family signature 

[ 1] Gallagher RG.. Forget B.G. J. Biol. Chem. 270:2635B.263fi!^( iQQ^) 
[ 2] Huang M.. Gu G., Ferguson E.L, Chalfie M. Nature 378:292-295(1995). 

[0261] 65, Barwin domain signatures 

•I .^j *♦***!*•♦* 

MoooaxxxxxjooocCxxxxxxxxxxCxxxxCxC^ 
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followed by a barwin-like C-termlnal domain. Barwin and its related proteins could be involved In a defense mechanism 
in plants. As signature patterns, two highly consented regions that contain some of the cysteines were selected 
Consensus pattern: C-G-[KR]-C-L-x-V-x-N [The two C's are involved in disulfide bondsl- 
Consensus pattern: V-[DN]-Y-[EQ]-F-V-[DNJ-C [C is involved in a disulfide bond]- 

(IqST"^^ ^ ' '•■ P- Roepstorff P.. Ludvigsen S.. Poulsen P.M. Biochemistry 31:8767-8770 

( 2] Potter S.. Uknes S.. Lawton K., Winter A.M., Chandler D.. Dimaio J.. Novitzky R.. Ward E Ryals J Mol Plant 
Microbe Interact. 6:680-685(1993). •• y ■ 

[0262] 68. (Bowman-Birk leg) Bowman-Birk serine protease inhibitors family signature 

PROSITE cross-ref erence(s). The Bowman-Birk inhibitor family [1 ] is one of the numerous families of serine proteinase 
inhibitors. As it can be seen in the schematic representation, they have a duplicated structure and generally possess 
two distinct Inhibitoiy sites: ' H««»»5,t, 



-, ^ 

I + + + + + I 

I i I I I I I I 

xxCCxxCxxCxx#xxCxxCxxxxCxxxCxxxCxxxxCxx#xxCxxCxxCxxCxx 

II II II 

+ ..| + ^ ^ I 

+ ^ 

< 70 residues > 

'C: conserved cysteine involved in a disulfide bond. 
W*: active site residue, 
position of the pattern. 

[0263] These inhibitors are found in the seeds of all legunninous plants as well as In cereal grains. In cereals they 
exist fn two forms, one of which is a duplicatbn of the basic structure shown above [2]. The pattern that was developed 
"P^^®^"^"^®® belonging to this family of inhibitors is in the central part of the domain and includes four cysteines 
[^64] Consensus pattern C-x(5.6)-[DENQKRHSTA]-C-[PASTDH]-[PASTDK]-[ASTDV]-C-[NDKS]-[DEKRHSTA|-C 
[The four C's are involved in disulfide bonds] Note this pattern can be found twice in some duplicated cereal inhibitors. 

[ 1] Laskowski M., Kato 1. Annu. Rev. Biochem. 49:593-626(1980). 

[ 2] Tashiro M., Hashino K., Shiozaki M.. IbukI R. MakI Z. J. Biochem. 102:297-306(1987). 

[0265] 67. Pathogenesis-related protein Bet v I family signature 

[0266] A number of plant proteins, which all seem to be involved in pathogen defense response, are structurally 
related [1 ,2.3], These proteins are: 

- Bet V I. the major pollen allergen from white birch. Bet v I Is the main cause of type f allergic reactions in Europe 
North America and USSR. ' 
AIn g I, the major pollen allergen from alder. 
Apl G I, the major allergen from celery. 
Car b I, the major pollen allergen from hornbeam. 
Cor a I, the major pollen allergen from hazel. 
Mai d I, the major pollen allergen from apple. 
Asparagus wound-induced protein AoPRI . 
Kidney bean pathogenesis-related proteins 1 and 2. 
Parsley pathogenesis-related proteins PR1 -1 and PR1-3. 
Pea disease resistance response proteins pl49, pl176 and DRRG49-C. 
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- Pea abscisic acid-responsive proteins ABR1 7 and ABR1 8. 

- Potato pathogenesis-related proteins STH-2 and STH-21 . 
Soybean stress-induced protein SAM22. 

[0267] These proteins are thought to be intraceliularly located. They contain from 155 to 160 amino acid residues 
As a signature pattern, a consented region located in the third quarter of these proteins has been^el^^ 
Consensus pattern: G-x(2)-[UVIVIFl-x(4)-E-x(2)-[CSTAEN)-x(8.9)-{GND]-G-[GS]- [CS]-x(2)-K-x(4)-[F>T. 

iS-llSo QsT"'"'"'' '^""'^ ^'"''""^ °- ^^'"'"""^^^ ^"^^ 

[2] Crowell D., John M.E.. Russell D., Amasino R.M. Plant Mol. Biol. 18-459-466(1992) 
13] Wamer S.A.J.. Scott R., Draper J. Plant Mol. Biol. 19:555-561(1992). 

[0268] 68. bZIP transcription factors basic domain signature 

The bZIP superfamily (1,2,] of eukaryotic DNA-binding transcription factors groups together proteins that contain a 
basic region mediating sequence-specffic DNA-binding foltowed by a leucine zipper r^uired fHrneriit^ "S^i^ 
AP f :ilh h T' " °' representatK,e members appears here. - tH: Sa^o^ 

AP-1 which binds selectively to enhancer elements in the cis control regions of SV40 and metallothionein I T AP-? 
also known as c-,un, is the cellular homolog of the avian sarcoma vims 1 7 (ASV1 7) oncogene v-]u7 JuTb and iunl 

oo^TT H r ''^'^ •° j""^'^^-^ - P^°tein, a protOK,n^ene that Cs a 

non.:ovalent dimer with c-)un. - The fos-related proteins fra-1 , and fos B. - Mammalian cAMP response eZen fCRE^ 
binding proteins CREB, CREM, ATF-1 . ATF-3. ATF-4. ATF-5. ATF-6 and LRF-1. - Ma«e 0^0^8 2 ttTans-iS 
Uanscnptional actuator involved in the regutetion of the production of zein proteins during enZperm SiC' 

H 7.K^?K '° ^^"•'■^ '° Tobacco TAF-1 and wheat EMBP-rAl 7ese 

proteins bind the G-box promoter elements of many plant genes. - Drosophila protein Giant which represses me 

tZc^rJ. ; T -9--^^*- genes. - Drosophi^^Box B biSngTc o 2 bbfJ^^^^^ 

transcript onal act«,ator that binds to fat body-specific enhancers of alcohol dehydrogenase and yolk protein qenes 
Drosophila segmentation protein cap'n'collar (gene cnc). which is involved in head^rphogenesV-S3ai 
ttn^rinr ■ t ''"^"'^P™"'^' 'ate Of ventral bbstomeres in the ea?iy embryo - 52^60^4 

enzymes m response to ammo acid starvation, and the related Neurospora crassa cpc-1 protein - Neuros^ra crasS 
cys-3 whch turns on the expression of structural genes which encode sutfur<atabolic Lzyls - Zs^MeSI 
transcriptional activator of su«ur amino acids metabolism. - Yeast PDR4 (or YAP1). a tranL'T^L^SSS the 
genes for some oxygen detoxification enzymes. - Epstein-Barr virus trans-activator protein BZ^R 
Consensus pattern: [KR]-x(1 ,3)-[RKSAQ]-N-x(2)-[SAQ](2)-x-[RKTAENQ]-x-R-x-rRK]- 
r^«i'^lo^o-^'°'^'" 2:"'05-168(1995).[2] Ellenbergerl Curr. Opin. Struct. Biol. 4:12-21(1994) 
10269] 69. Biotin-requiring enzymes attachment site 

^ '^^""'^^ '=°^«'^""y attached. Via an amide bond to a 

lysine residue in enzymes requiring this coenzyme (1 ,2.3.4]. Such enzymes are: 

- Pyruvate carboxylase (EC 6.4. 1 . 1 ). 

- Acetyl-CoA carboxylase (EC 6.4. 1 .2). 

- Propionyl-CoA carboxylase (EC 6.4.1 .3). 

- Methylcrotonoyl-CoA carboxylase (EC 6.4. 1 .4). 

- Geranoyl-CoA carboxylase (EC 6.4. 1 .5). 

- Urea carboxylase (EC 6.3.4.6). 

- Oxaloacetate decarboxylase (EC 4. 1 . 1 .3). 

- Methylmalonyl-CoA decarboxylase (EC 4. 1 . 1 .41 ). 

- Glutaconyl-CoA decarboxylase (EC 4. 1 . 1 .70). 

- Methylrnalonyl-CoA carboxyl-transferase (EC 2.1 .3.1) (transcarboxylase). Sequence data reveal that the reaion 
around the biocytin (biotin-lysine) residue is well consented and can be used as a signature pattern 

hS^°^ « Consensus Pattem[GN]-[DEQTR]-x-[LIVMFY].x(2)-(LIVM]-x-[AIVJ-M-K-[LMAT]-x(3)-[LIVMl-x-[SAVl fK is the 
thet^ """'^ "^"""'^ biotin-blndlng lysine reildue is evolitioni^, rTted toTa a ound 

the lipoyl-binding lysine residue of 2-oxo acid dehydrogenase acyltransferases 

[ 1] Knowles J.R. Annu. Rev. Biochem. 58:195-221(1989) 

[2] Samols D.. Thronton C.G., Murtif V.L, Kumar G.K.. Haase FC. Wood H.G. J. Biol. Chem. 263:6461-6464 
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(1988). 

( 3] Goss N.H.. Wood H.G. Meth. Enzymol. 107:261-278(1984) 
[4Jg,enoyB.C..XieY.ParkV.L.Kun^rGX..BeegenH..WoodH.^ 

[0271] 2-oxo acid dehydrogenases acyltransferase component lipoyi binding site 

dehydrogenase multienzyme complexes (1 ,2] from bacterial and eukaryotic sources catalyze the oxi- 
c1mXe?*r " 

- Pyruvate dehydrogenase complex (PDC). 

- 2-oxoglutarate dehydrogenase complex (OGDC). 

- Branched-chain 2-oxo acid dehydrogenase complex (BCOADC). 

architecture: they are composed of multiple copies of three component en- 
zymes - El . E2 and E3. El is a thymine pyrophosphate^ependent 2-oxo acid dehydrogenase, E2 a dihydroliDamide 
acyltransferase. and E3 an FAD-containing dihydrolipamide dehydrogenase ainydrolipamide 
[0272] E2 acyltransferases have an essential cofactor, lipoic acid, which is covalently bound via a amide linkaoe to 

w « Tr'"'' °' "'"^ ^ "P°y' 9-"P those of pTb nTeithrr 

in ai7^ntl"f Po "^""""^''^^ °^ Azotobacterand in Eschericha coli) lipoyi gro psS] 

- H-protein of the glycine cleavage system (GCS) [4]. GCS is a multienzyme complex of four protein components 
Which cata^zes the degradation of glycine, H protein shuttles the methylamine group of glycinermrpZe n 
to the T protein. H-protein from either prokaryotes or eukaryotes binds a single lipoic group 

- Mammalian and yeast pyruvate dehydrogenase complexes differ from that of other sources, in that they contain 

related to that of E2 subunits and seems to bind a lipoic group [5]. 
• Fast migrating protein (FMP) (gene acoC) from Alcaligenes eutrophus [6] 

This protein is most probably a dihydrolipamide acyltransferase invoK^ed in acetoin metabolism. 

A signature pattern was developed which allows the detection of the lipoyl-binding site 

Sn^xTiI^'yi rrr.'^^T^i'''^ 

(5) [GCN -x-[LIVMFY] (K is the lipoyl-binding site] Note the domain around the lipoyl-binding lysine residue is evo u 
tionary related to that around the biotin-binding lysine residue of biotin requiring enzymes 

[ 1] Yeaman S.J. Biochem. J. 257:625-632(1989). 

[ 2] Yeaman S.J. Trends Biochem. Sci. 11:293-296(1986). 

[ 3] Russel G.C., Guest J.R. Biochim. Biophys. Acta 1076:225-232(1991) 

[ 4] Fujiwara K.. Okamura-lkeda K.. Motokawa Y J. Biol. Chem. 261 8836-8841 (1 986) 

1 5] Behal R.H.. Browning K.S.. Hall TB.. Reed LJ. Proc. Natl. Acad. Sci. U.S.A. 86:8732-8736(1989) 

[ 6] Priefert H.. Hein S.. Krueger N.. Zeh K.. Schmidt B., Steinbuechel A. J. Bacteriol. 173:4056-4071(1991). 

[0274] 70. C2 (C2 domain) Number of members: 295 

Some isozymes of protein kinase C (PKC) [1.2] contain a domain, known as C2. of about 116 amino-acid residues 

(3 EmoT?2 r <PDOC00100». Regions with significant homology 

IJ.bl J to the C2-domain have been found in the following proteins: 

- PKC isofomns alpha, beta and gamma and Drosophila isoforms PKC1 and PKC2 

■ the N-Sra^e'xllrS" "^^""'^'^"'^ ^'^^^^ ^'""^^ Veas, PKC1 have a C2-.ike domain at 

- Yeast cAMP dependent protein kinase SCH9 contains a C2-like domain 

' ^^^.'"Jf " P^^^^^ phospholipase C (PI-PLC) (see <PDOC50007» isoforms beta, gamma 

Snr^il H f TT T""^"^"^" '''■'''-^^ ^'^'^ ^ ^2-like domain C-termmal of the catalytic do^r 
JSc sTb^J phosphatidylinosi.ol-3-kinase have a C2-like domain in the central region oTthe ll^Kd 
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- Yeast phosphatidylserine-decarboxylase 2 (gene PSD2) contains a C2 domain in its central region 

- Cytosohc phospholipasep from plants and cytosollc phospholipase A2 have a C2-like domain at their N-terminus 

- Synaptotagmins (p65). This is a family of related synaptic vesicle proteins that bind acidic phospholipids and that 
may have a regulatory role in the membrane interactions during trafficking of synaptic vesteles at the active zone 
of the synapse. All isoforms of synaptotagmins have two copies of the C2 domain in their C4erminal region 

- Habphilin-3A, a synaptic protein contains two C2 domains. 

' ^nT°roTl, 1'°'^'" ""'^"^^ '^"'^ "^"=-■'3 f^as a C2 domain in its central part 

and a C2-like domain at the C-terminus. 

- rasGAP and the breakpoint cluster protein bcr have a C2-domain C4erminal of a PH-domain 
■ Yeast protein BUD2 (or CLA2) has a C2-domain in the central regfon. 

- Yeast protein RSP5 and human protein NEDD-4. both proteins also contain WW domains (see <PDOC50020>) 

- Perform (see <PDOC00251 >) has a C2 domain at the C-temiinus. It is the onty extracellular protein known to 
contain a C2 domain. 

- Yeast hypothetical protein YML072C has a C2 domain. 

- Yeast hypothetical protein YNL087W has three C2 domains. 

- Caenorhabditis elegans hypothetical protein F37A4.7 has two C2 domains 

^^ro^"^"" '° '"""'^^'^ calcium-dependent phospholipid binding [5]. Since domains related to 

the C2 domain are also found in proteins that do not bind calcium, other putative functions for the C2 domain like 
e.g. binding to inositol-1 ,3.4.5-tetraphosphate have been suggested [6]. Recently, the 3D structure of the first C2 
7"^P|f*^9rnin has been reported [7]. the domain forms an eight-stranded beta sandwich. The signature 
pattern hat has been deyetoped for the C2 domain is located in a conserved part of that domain, the connecting 
loop between beta strands 2 and 3. A profile has been developed for the C2 domain that covers the total domain 

- Consensus pattern: (ACG]-x(2)-L-x(2,3)-D-x(1 ,2)-[NGSTLIFJ-[GTMRl-x-[STAP]-D-[PA]-[FYI 

- Note: this documentation entry is linked to both a signature pattern and a profile. As the profile is much more 
sensitive than the pattern, you should use it if you have access to the necessary software t<x,ls to do so 

fSJ? r'Jo^T ^'^'"'''"^ '^"^"y- in PKCs delta, epsiton, eta and thela. phosphol- 

ipases.GAPsandpertorin. Ponting CP, Parker PJ; Protein Sci 1996;5:162-166. 

[ 1] AzzI A.. Boscoboinik D., Hensey C. Eur J. Biochem. 208:547-557(1992). 
[ 2] Stabel S. Semin. Cancer Biol. 5:277-284(1 994). 

[ 3] Brose N., Hofmann K.O., Hata Y, Suedhot TC. J. Biol. Chem. 270:25273-25280(1995) 
[ 4] Sossin W.S., Schwartz J.H. Trends Biochem. Sci. 18:207-208(1993). 
[ 5] Davletov B.A., Suedhof T.C. J. Biol. Chem. 268:26386-26390(1993). 

[ 6] Fukuda M.. Amga J., Niinobe M.. Almoto S.. Mikoshiba K. J. Biol. Chem. 26929206-29211(1994) 
[ 6] Sutton R.B., Davletov B.A., Berghuis A.M., Suedhof TC, Sprang S.R. Cell 80:929-938(1995). 

[0276] 71 . CAP (CAP protein) Number of members: 1 1 

l"„H«f P^°'«'" i« a bHunctional protein whose N-terminal domain binds to adenylyl 

cyclase, thereby enabling that enzyme to be activated by upstream regulatory signals, such as Ras. The functton of 
the C-terminal domain .s less clear, but it is required for nomial cellular morphology and growth control [11 CAP is 
consented in higher eukaryotic organisms where its function is not yet clear [2] 

Structurally, CAP is a protein of 474 to 551 residues which consist of two domains separated by a proline-rich hinge 
Two signature patterns, one corresponding to a conserved region in the N-tenninal extremity and the other to a C- 
terminal region have been developed. 



Consensus pattern: [LIVMJ(2)-x-R-L-[DE]-x(4)-R-L-E 
Consensus pattern: D-[L!VMFY]-x-E-x-[PA]-x-P-E-Q-(LIVMFY]-K 



1 1] Ka^emukai M., Ge^t J., Field J., Riggs M.. Rodgers L, Wigler M.. Young D. Mol. Biol. Cell 3:167-180(1992) 
[ 2] Yu G., Swiston J.. Young D. J. Cell Sci. 107:1671-1678(1994). 

[0277] 72. CAP.GLY (CAP-Gly domain) 

a wL^;^tS°',''^'T;?'~'^'^ '''°'^'"'- Swiss:P39937 may be a member but has not been included. It has 
a weak match to the family between residues 22-67. Number of members- 24 

KBS^hrstn^^^^^^^^^^ cytoskeleton^ssoclated proteins. Riehemann K. Sorg C; 
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[0278] It has been shown [1] that some cytoskeleton-associated proteins (CAP) share the presence of a consen/ed 
9lyc.ne-nch doma>n of about 42 residues, called here CAP-Gly. Proteins known to contain this dornain a^e listed S 

" fi1«m«nS'rnH^KTl ^! Cytoplasmic linker protein-170 or CLIP-1 70). a 160 Kd protein associated with intermediate 
filaments and that links endocytic vesicles to microtubules. Restin contains two copies of the CAP^Iy domain 

" actttor T r I? ^"^r^'^^-' P°^P«P«d«: DAP) and Drosophila glued, a major comZ^ of 

activator I. a 20S polypeptide complex that stimulates dynein-mediated vesicle transport 

for spindle pole body fusion during conjugation. 

- Yeast protein NIP100 (NIP80). 

" l^roZ^npt^P^f -iT'^''^- S<=^''^°««'='=haromyces pombe protein alpll and Caenorhabditis elegans hypothet- 
CAP% Somai^ ^ ^ <PDOC00271 » and a C-terminal 

- Caenortiabditis elegans hypothetical protein M01A8.2. 

■ Yeast hypothetical protein YN LI 48c. 

ci'iSn^.hl^lp^l"!! T"^ °' '^'"^ P^''"- ^" ^-'^"^^^^ is most probably globular and 

fSr« fhL TTL"' f P^^"'^*^" ^" ^IPha-helical coiled-coil confoLtion and 

finalV a short C-terminal globular domain. The signature for the CAP-Gly domain corresponds to the first 32 residues 
of the domain and includes five of the six consented glycines. as « me iirsi residues 

■ Consensuspattem:G-x(8,10)-[FYW]-x-G-[LIVM]-x-[LIVMFY]-x(4)-G-K-[NH]-x-G-[STAR]-x(2)-G-x(2)-[LY]-F 

[ 1] Riehemann K.. Sorg C. Trends Biochem. Sci. 18:82-83(1993) 
[0279] 73. (CBD1) 
Cellulose-binding domain, fungal type 

I^rJrSIr K ? of cellulose and xylans requires several types of enzymes such as endoglucanases (EC 
3.2.1.4), cellobiohydrolases (EC 3.2.1.91) (exoglucanases), orxylanases (EC 32 1 8) [1] 

domain (CBD) by a short linker sequence nch in proline and/or hydroxy-amino acids 

- Endoglucanase I (gene egll ) frort) Trichoderma reesei. 

- Endoglucanase II (gene egl2) from Trichoderma reesei. 

- Endoglucanase V (gene egl5) from Trichoderma reesei 

■ Sc'jS^mr^^^^^^ ' i T "tSf"^ '^^^^ ''""^'^^'^ Neurospora crassa. Phanerochaete chrysosporiurr 
Tnchoderma ree<ei, and Tnchoderma viride. 

- Exocellobiohydrolase II (gene CBHIl) from TrichoderrDa reesei. 

- Exocellobiohydrolase 3 (gene cel3) from Agaricus bisporus 

- Endoglucanases B, C2. F and K from Fusarium oxysporum. 

ofeo?5l nHhi:!^ A found either at the N-temiinal (Cbh-ll or egl2) or at the C-terminal extremity (Cbh-I. egll 

ore9l5)ofthesee^izymes.Asit,ssho^ 

m this type of CBD domain, all involved in disulfide bonds. «^nservea cysieines 

+ + 

I + 

I I I I 

xxxxxxxCxxxxxxxxxxCxxxxxCxxxxxxxxxCx 

'C: conserved cysteine involved in a disulfide bond, 
position of the pattern. 

[0283] Such a domain has also been found in a putative polysaccharide binding protein from the red alga, Porphyra 
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purpurea [2]. Structurally, this protein consists of four tandem repeats of the CBD domain 

[0284] Consensus pattemC-G-G-x(4.7)-G-x(3)-C-x(5)-C-x(3,5HNHG]-x-[FYWMh x(2)<3-C [The four C's are in 
volved .n disulfide bonds] Sequences known to betong to this class detited by the pattemALL^ 

{2! St Ir jTr™ ° ' ' ^"'^^ ^^^303-315(1991). 

[028q 74. CBS domain. 3D Structure found as a subdomain in TIM barrel of inosine-. CBS domain web oaae CBS 
domarns are small intracellular modules mostly found in 2 or four copies within a protein. CBS ZL^s a?e lound S 

m^lZr?' H??"'' ^""^^ "^''^ *° homocystinur^. Two CBS domains arXnd n 

monophosphate dehydrogenase from all species, however the CBS domains are not needed for activi^ TvITci 

to homocyslinuria. Number of members: 414 ^wia^.r-ooo^u leaa 

l^T^ZriT^T'VT"'^ °' ^ '° archaebacteria and the homocyslinuria disease pro- 

tein. Bateman A; Trends Biochem Sell 997-22- 1 2-1 3 ^ 

[2]Medline: 96279836 Structure and mechanism of inosine monophosphate dehydrogenase in complex with the 
immunosuppressant mycophenolic-acid. Sintchak MD, Fleming MA. Futer O. Raybu7k SA ChambTsP Caro^ 
PR, Murcko MA, Wilson KP; Cell 1 996;85:921 -930. onamoers bP. caron 

Discovery of CBS domain. 

r»T^T«z"/« str7s'frsr ^^^^ -^^^ ^ "^^^ 

[0286] 75. CDP-OH_P_transf (CDP-alcohol phosphatidyltransferase) 

ilh teH^^^i^ «T''^K ^TJ^^ ^''l^ '° '^'^'^^ displacement of CMP from a CDP^Icohol by a second alcohol 
wrthfomiat,onofaphosphodiesterbondandconcomtent breaking Of 

Pj-^^f 3f^"ransferases, which are all involved in phospholipid biosynthesis and that share the property 
^splacement of CMP from a CDP-alcohol by a second alcohol with formation of a phosphS^ 
en" mes aT" " ' °' ' ''"''"'"^ ^"'^'"'^ ''''' ' regirnTlTrhese 

■ Ethanolaminephosphotransferase (EC 2.7.8. 1 ) from yeast (gene EPT1 ) 

- Diacylglycerol cholinephosphotransferase (EC 2.7.8.2) from yeast (gene CPT1 ) 

" t^^TralS'Sry^^^^^^^^^ 

■ (Te^^SCirS^^^^^^^ 0-phosphatidy.ansferase) from yeast 
' (''gen'e ^^"^ ^ ''^ ''^ (CDP-diacylglycerol-inositol 3-phosphatidyltransferase) from yeast 

These enzymes are proteins of from 200 to 400 amino acid residues. The consen/ed region contains three asoartic 
acid residues and is located in the N-terminal section of the sequences. ^ 

• Consensus pattem: D-G-x(2)-A-R-x(8)-G-x(3)-D-x(3)-D 

erLnS.?^'"'^ ' ^'^^^ °' transmembrane peptides from Escherichia coli phosphatidylglyc- 

[ 1] Nikawa J. -I.. Kodaki T, Yamashrta S. 
J- Biol. Chem. 262:4876-4881(1987). 
[ 2] Hjelmstad R.H.. Bell R.M. 
J. Biol. Chem. 266:5094-5134(1991). 

H M uf (Cholesterol oxidase) Members of the GMC oxidoreductase family. Number of members* 3 

Z^J^ ^T^''' ''"^'"''^ ^'^P'^xed vl a steroid subs^^t^^^ 

tions for flavin adenine dinucleotide dependent alcx)hol oxidases. Li J. Vrielink A. Brick R Btow DW Bi^hem^^^^^^^^ 
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32:11507-11515. 
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rcS,^^^^^^^^^ - - — rv relate. T.ese 

" TyCn peiSLV"'' '" ' ''^ ^"""^ '^^^^^^^ ^ "> -etaldehyde . 

■ ^^^^^^ ^ -p.or .> 

- Glucose dehydrogenase (GLD) (EC 1 . 1 .99. 10) from Drosophlla. Reaction catalyzed: glucose + unknown acceptor 
-> delta-gluconolactone + reduced acceptor. uniMiownaccepror 

- Cholesterol oxidase (CHOD) (EC 1.1.3.6) from Brevlbacterium sterolicum and Streptomyces strain SA-COO Re- 
action catalyzed: cholesterol + oxygen -> cholest^-en-3K)ne + hydrogen peroxide 

- /MkJ [3]. an alcohol dehydrogenase from Pseudomonas oleovorans, which converts aliphatic medium<:hain-lenQth 
alcohols into aldehydes. This family also includes a lyase- meaium<:nain length 

- (R)-mandelonitrile lyase (EC 4.1.2.10) (hydroxynitrile lyase) from plants 14], an enzyme involved in cyanoqenis 
reS^n^^ These enzymes are proteins of size ranging f rom STchd) 

fn Z M °* ''^^^ «^ ='9"«^"f« patterns. The first one is 

r : rc^:;zr 'er. re;™- " ^^^'"^^ ^^^---^ 
: c^^^":u^^r;[3^^^^^^^^^^^^ 

[ 1] Cavener D.R. J. Mol. Biol. 223:811-814(1992). 

[ 2] Henikoff S., Henikoff J.G. Genomics 19:97-107(1994) 

[ 3] van Beilen J.B.. Eggink G., Enequist H. Bos R.. Witholt B. Mol. Mkjrobiol. 6:3121-3136(1992) 
[ 4] Cheng I.R, Poulton J.E. Plant Cell Physiol. 34:1139-1143(1993). 

(Cyclin-dependent kinase regulatory subunit) Number of members: 11. Cyclin^Jependent kinases 
(CDK) are protein kinases which associate with cyclins to regulate euka^rotic cell cycle prc^ressioJ ?he moS weN 

latrl 'f r;^''' ''^''^ ^''^^ '"'"'"'^ S-phase Jd mLI. CD^s b^d to a egu 

aton. subunit whidi is essential for their biological function. This regulatory subunit is a small protein of 79 to 1 50 

X"SiaSL r » h''k^ '"'r "T." ''''' ^"'^'^ ^ ^'"9-e isofom, is known. whi.e'mammaL ha e 

actsasahubfortheol,gomer,zat.onofsb<CDKcatalyticsubunHs.ThesequenceofCDKregulatorysubmrt^^^^^^^ 
conserved therefore, the two most consen/ed regions have been used as signature patterns ^ ^ 

- Consensus pattern: Y-S-x-[KR]-Y-x-[DE](2)-x-[FY]-E-Y-R-H-V-x-ILV]-fPT]-rKRPl 

- Consensus pattern: H-x-P-E-x-H-(IV]-L-L-F-(KR] 

K ^t'^n ! ■ ■ - ^ l • T^'"^^ J A Science 262:387-395(1993). 

[0292] 78. CKJLbeta (Casein kinase II regulatory subunit) 

."^r-nH hlr""!!!''^ l^- P^'"'" ^'"^'^ " ^" serine/threonine protein Wnase which 

oV^ m"] the cytoplasm and the nucleus and whose substrates are numerous. It generally phosphor^lSerSer 

ol^'Xics^^^^ 

. I J ^ ?^ ^""""'^ "'ost species there are two ctosely related 

Tp^r ^ f ^'"^^ ^'P^^' ^P^'^S' ^"^f' '"ngi and plants, express i^ foTmsS 
regulaton^subunits^^^^^^^ 

'ras i^,2, Thl rl^^^^^ 

sucn as zinc [2J. This regies has been used as a signature pattern. 

- Consensuspattern:C.p.x-[LIVMY]-x.C.x(5)-[LI].P-[LIVMChG-x(9)-V-[KR]-x(2)-C-P.x-C 
[ 1] Allende J.E.. Allende C.C. FASEB J. 9:313-323(1995). 
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[ 2] Reed J.C.. Bidvwai A.P.. Gbver C.V.C. J. Biol. Cham. 269:18192-18200(1994). 
[0293] 79. CLP_protease (CIp protease) 

These proteins belong to family S14 In the classification of peptidases. 

- I- The CIp protease has an active site catalytic triad. In E. coli CIp protease, ser-111. his-136 and asp-1B5 form 
the catalytic triad. 

- I- Swiss: P48254 has lost all of these active site residues and is therefore inactive. 

- I- Swiss:P42379 contains two large insertions, Swiss:P42380 contains one large insertion. Number of members: 38 

[02941 The endopeptidase CIp (EC 3.4.21 .92) from Escherichia coli cleaves peptides in various proteins in a process 
that requires ATP hydrolysis (1 ,2]. CIp is a dimeric protein which consists of a proteolytic subunit (gene cIpP) and either 
of two related ATP-binding regulatory subunits (genes clpA and cIpX). CIpP is a serine protease which has a chymo- 
rypsin-hke activity. Its catalytic activity seems to be provided by a charge relay system similar to that of the trypsin 
family of serine proteases, but which evolved by independent convergent evolution. Proteases highly similar to CIpP 
have been found to be encoded in the genome of the chloroplast of plants and seem to be also present in other 
eukaryotes. The sequences around two of the residues involved in the catalytic triad (a serine and a histidine) are 
highly conserved and can be used as signature patterns specific to that category of proteases. 

- Consensus partem: T-x(2)-[LIVMF]-G-x-A-[SAC]-S-[MSA)-[PAG]-[STAl [S is the active site residue] 

- Consensus pattern: R-x(3)-[EAP]-x(3)-[UVMFYT]-M-[LIVM]-H-Q-P [H is the active sHe residue] 

[IjMedline: 9B050920. The structure of CIpP at 2.3 angstroms resolution suggests a model for ATP-dependent 

proteolysis. Wang J, Hartling JA, Flanagan JM; Cell 1 997;91 :447-456. 

[ 1] Maurizi M.R., Clark W.R, Kim S.-H., Gottesman S. J. Biol. Chem. 265:12546-12552(1990). 

[2] Gottesman S., Maurizi M.R. Microbiol. Rev. 56:592-621(1992). 

[ 3] Rawlings N.D.. Barrett A.J. Meth. Enzymol. 244:19-61(1994). 

[0295] 80. CNG_membrane (Transmembrane region cyclic Nucleotide Gated Channel) 

[1 IMedline: 94224763. Cyclic nucleotide-gated channels: an expanding new famify of ion channels Yau KW Proc Natl 
Acad Sci USA 1994;91:3481-3483. 

This famny is found to the N-terminus of the cNMP_binding. Number of members: 56. Proteins that bind cyclic nucle- 
otides CAMP or cGMP) share a structural domain of about 120 residues [1-3]. The best studied of these proteins is 
the prokaryotic catabolite gene activator (also known as the cAMP receptor protein) (gene crp) where such a domain 
known to be composed of three alpha-helices and a distinctive eight-stranded, antiparallel beta-barrel structure 
Such a domain is known to exist in the following proteins: 

- Prokaryotic catabolite gene activator protein (CAP). 

• CAMP- and cGMP-dependent protein kinases (cAPK and cGPK). Both types of kinases contains two tandem copies 
of the cyclic nucleotide-binding domain. The cAPK's are composed of two different subunits: a catalytic chain and 
a regulatory chain which contains both copies of the domain. The cGPK's are single chain enzymes that include 
the two copies of the domain in their N-terminal section. The nucleotide specificity of cAPK and cGPK is due to 
an ammo acid in the conserved region of beta-barrel 7: a threonine that is invariant in cGPK is an alanine in most 
cAPK. 

• Vertebrate cyclic nucleotide-gated ion-channels. Two such catbns channels have been fully characterized One 
IS found in rod cells where it plays a role in visual signal transduction. It specifically binds to cGMP leading to an 
opening of the channel and thereby causing a depolarization of rod photoreceptors. In olfactory epithelium a similar 
cAMP-binding, channel plays a role in odorant signal transduction. There are six invariant amino acids in this 
domain, three of which are glycine resklues that are thought to be essential for maintenance of the structural 
integnty of the beta-barrel. Two signature patterns have been developed for this domain. The first pattern is located 
within beta-barrels and 3 and contains the first two conserved Gly. The second pattem is located within beta- 
barrels 6 and 7 and contains the third conserved Gly as well as the three other invariant rescues. 

Consensus pattem: (UVM]-[VIC]-x(2)-G-(DENQTA]-x-[GAC]-x(2)-[LIVMFY](4)-x(2)-G 
Consensus pattem: [UVMF]-G-E-x-[GAS]-[LIVM]-x(5,11)-R-(STAQ]-A-x-[LIVMAJ-x-(STACV] 

[ 1] Weber I.T. Shabb J.B., Coibin J.D. Biochemistry 28:6122-6127(1989). 
[2] Kaupp U.B. Trends Neurosci. 14:150-157(1991). 
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[ 3] Shabb J.B., Corbin J.D. J. Biol. Chem. 267:5723-5726(1992). 
[0296]^^ 81. COX10_ctaB_cyoE (Cytochrome c oxidase assembly factor) 
Mediine: 95191390 

Biosynthesis and functional role of haem O and haem A 
Mogi T. Saiki K, Anraku Y; 
Mol Microbiol 1994;14:391-398. 

coSJS?'''"'' ' """^"^^^^ ^ complexity of this enzyme requires assistance In building the 

This is carried out by the Cytochrome c oxidase assembly factor. 
Number of members: 31 

Z^HhT^ """"T^ ^""y"^''" "^'^P'^^ ^'^'^ of a number of 

prote,ns that erther act as chaperonins to help the subunits of the enzyme to fold correctly, or assist in the assembl 
of the metal centers [1]. One of these subunits is known as COX10 in yeast and as ctaB [2 in aeSirproka^^^^^^ 

n =.?^ I^r^ ^T'"' ^'""^^^'^ transmembrane segments. The most conserved region is located 

rn a loop between the second and third of these segments and has been selected as a signature pattern 

- Consensus pattern: [EDI-x-D-x(2)-M-x-R-T-x(2)-R-x(4)-G 

[ 1] Nobrega M.R, Nobrega F.G., Tzagoloff A. 

J. Biol. Chem. 265: 1 4220-1 4226(1 990). 
[ 2] Cao J.. Hosier J., Shapleigh J.. Revzin A., Ferguson-Miller S 

J. BioL Chem. 267:24273-24278(1992). 
[ 3] Chepuri V., Gennis R.B. 

J. Biol. Chem. 265:12978-12986(1990). 

[0299] 82. COX3 (Cytochrome c oxidase subunit III) 
This family corresponds to chains c and p 
Ml 

Medline: 96216288 
The whole structure of the 13-subunit oxidized cytochrome c 
oxidase at 2.8 A. 
Tsukihara T Aoyama H. Yamashita E. Tomizaki T Yamaguchi H, 
Shinzawa-ltoh K, Nakashima R, Yaono R, Yoshikawa S; 
Science 1 996;272: 1 1 36-1 1 44. 
Number of members: 224 

[0300] 83. COX5B (Cytochrome c oxidase subunit Vb) 
[1] 

Medline: 96216288 

The whole structure of the 13-subunit oxidized cytochrome c oxidase at 2 8 A 
^Tsukihara T. Aoyama H. Yamashita E. Tomizaki T. Yamaguchi H. Shinzawa-ltoh K. Nakashima R. Yaono R. Yoshikawa 

Science 1996;272:1136-1144. 
This family consists of chains F and S 
Number of members: 10 

^, '■^■^■'^ ^ ^'^^'^'^ "^Pl^'^ i« a component of the 

respiratory chain cornplex and >s involved in the transfer of electrons from cytochrome c to oxygen. In eukaryotes this 
er^yme complex is located In the mitochondrial inner membrane; in aerobic prokaryotes n^s tound inlSl Xma 
membrane. In addition to the three large subunits that fom, the eatable center of the enzyme coZlei the^ areT 
eu^,yotes.avariable^^^^ 

c^^^H . T ^ '^"'"'^ °' Vb is well consented and inclu~;e 

telTn^ T """^'^ '° ^'"'^ (2]. TWO Of these cysteines are cluster^Tthe C 

temriinal section of the subunit; this region has been selected as a signature pattern. 

- Consensus pattern: ILIVf«ll(2)-[FYW]-x(10)-C-x(2)-C-G-x(2HFY)-K-L [The two C's probably bind zinc] 
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[ 1] Capaldi R.A., Malatesta R. Darley-Usmar V.M. Biochim. Bbphys. Acta 726:135-148(1983). 

[ 2] Rizzuto R.. Sandona D., Brini M., Capaldi RA., Bisson R Biochim. Biophys. Acta 1129:100-104(1991). 

[03021 84. COesterase (Carboxylesterases) 
Chollnesterase pages 

The prints entry is specific to acetylcholinesterase 
Number of members: 273 

[0303] Higher eukaryotes have many distinct esterases. Among the different types are those which act on carboxylic 
esters (EC 3.1.1 .-). Carboxyl-esterases have been classified into three categories (A. B and C) on the basis of differ- 
ential patterns of inhibition by organophosphates. The sequence of a number of type-B carboxylesterases indicates 
[1,2,3] that the majority are evolutionary related. This family currently consists of the following proteins: 

- Acetylcholinesterase (EC 3.1.1.7) (AChE) [El] from vertebrates and from Drosophila. 

- Mammalian chollnesterase II (butyryl chollnesterase) (EC 31 .1 .8). Acetylcholinesterase and chollnesterase II are 
closely related enzymes that hydrolyze choline esters [4]. 

Mammalian liver microsomal carboxylesterases (EC 3.1,1.1 ). 

- Drosophita esterase 6, produced in the anterior ejaculatory duct of the male insect reproductive system where it 
plays an important role in its reproductive biology 

Drosophila esterase P. 

- Culex pipiens (mosquito) esterases B1 and B2. 

Myzus persicae (peach-potato aphid) esterases E4 and FE4. 

- Mammalian bile-salt-activated lipase (BAL) [5], a multifunctional lipase which catalyzes fat and vitamin absorption 
It is activated by bile salts in infant intestine where it helps to digest milk fats. 

- Insect juvenile hormone esterase (JH esterase) (EC 3. 1 .1 .59). 

- Lipases (EC 3.1 . 1 .3) from the fungi Geotrichum candidum and Candida rugosa. 
Caenorhabditis gut esterase (gene ges-1 ). 

■ Duck fatty acyl-CoA hydrolase, medium chain (EC 3.1 .2. 14). an enzyme that may be associated with peroxisome 
proliferation and may play a role in the production of 3-hydroxy fatty acid diester pheromones. 

- Membrane enclosed crystal proteins from slime mold. These proteins are, most probably esterases; the vesicles 
where they are found have therefore been termed esterosomes. 

[0304] So far two bacterial proteins have been found to belong to this family: 

- Phenmedipham hydrolase (pheny [carbamate hydrolase), an Arthrobacteroxidans plasmid-encoded enzyme (gene 
pcd) that degrades the phenylcarbamate herbicides phenmedipham and desmedipham by hydrolyzing their central 
carbamate linkages. 

Para-nitrobenzyl esterase from Bacillus subtilis (gene pnbA). 

[0305] The following proteins, while having lost their catalytic activity contain a domain evolutionary related to that 
of carboxylesterases type-B: 

- Thyroglobulin (TG), a glycoprotein specific to the thyroid gland, which is the precursor of the iodinated thyroid 
hormones thyroxine (T4) and triiodo thyronine (T3). 

- Drosophila protein neuractin (gene nrt) which may mediate or modulate cell adhesion between embryonic cells 
during development. 

- Drosophila protein glutactin (gene gft). whose function is not known. 

[0306] As is the case for lipases and serine proteases, the catalytic apparatus of esterases involves three residues 
(catalytic triad): a serine, a glutamate or aspartate and a histidine. The sequence around the active site serine is well 
conserved and can be used as a signature pattern. A conserved region located in the N-terminal sectbn containing a 
cysteine involved in a disulfide bond has been selected as a second signature pattem. 

- Consensus pattem: F-[GR]-G-x(4)-[LIVM]-x-[LI V].x-G-x-S-[STAG].G[S is the active site residue] 

- Consensus pattem: [ED]-D-C.L-[YT]-[LIV]-[DNS]-[LIV]-[LIVFYW]-x-[PQR] [C is involved in a disulfide bond] 

[ 1] Myers M.. Richmond R.C., Oakeshott J.G. Mol. Biol. EvoL 5:113-119(1988). 

[ 2] Krejci E.. Duval N., Chatonnet A., Vincens P., Massoulie J. Proc. Natl. Acad. Sci. U.S.A. 88:6647-6651(1991) 
[ 3] Cygler M, Schrag J.D.. Sussman J.L, Hare! M., Silman I. Gentry M.K., Doctor B.R Protein Sci. 2:366-382 
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(1993). 

[ 4] LcDCkridge O. BioEssays 9:125-128(1988). 

[ 5J Wang C.-S., Hartsuck JA Biochim. Biophys. Acta 1166:1-19(1993). 
[0307] 85. CPSase_L_chain (Carbamoyl-phosphate synthase (CPSase)) 
Medline: 94347758 

1?,^"^'']^?"^°"^' structure of the biotin cart>oxylase subunit. of acetyl-CoA carboxylase 
Waldrop GL. Rayment I, Holden HM; "UAywse. 

Biochemistry 1994;33:10249-10256 
[1] 

Medline: 90285162 

rr^sscsTor'"'"' '"^'^ ^ »' ^ « ^--^ 

Simmer JR Kelly RE. Pinker AG Jr, Scully JL, Evans DR- 
Biol Chem 1 990;265: 1 0395-1 0402 

The small chain has a GATase domain in the carboxyl terminus 
See GATase. 
Number of members: 181 

Se'S C 6 TfJt fA^r„f "'f° T'"^ ^ "'Py tho biotin<iependent enzymes acetyl-CoA car- 

- Consensus pattern: tFYVHPS]-(LIVMC]-[LIVMAHLIVM]-(KR]-tPSAJ-[STA]-x(3)-[SGl-G-x-fAGl 

- Consensus pattern. ILIVMFHLIMN]-E-(LIVMCAJ-N-(PATLIV^HKR]-[LIVMSTACJ 

[ 1] Simmer J.P., Kelly R.E.. Rinker A.G. Jr., Scully J.L. Evans D R 

J. Biol. Chem. 265: 1 0395-1 0402(1 990). 
[ 2] Davidson J.N., Chen K.C.. Jamison R.S.. Musmanno LA.. Kern C B 

BioEssays 15:157-164(1993). 

[0314] 86. CPSase_sm_chain (Carbamoyl-phosphate synthase small chain. CPSase domain) 
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[1] 

Medline: 90285162 

Simmer JR Kelly RE, Rinker AG Jr, Scully JL. Evans DR; 
Bio! Chem 1 990;265: 1 0395- 1 0402. 

The carbamoyl-phosphate synthase domain is in the amino temiinus of protein 
am;Sol^^''^r synthase cata^zes the ATP-dependent synthesis of carbamyl-phosphate from glutamlne or 
~resf!r • "^""^ ""^ ''''' the biosUesis of arginiranS 
The ca*amoyl-phosphate synthase (CPS) enzyme in prokaiyotes is a heterodimer of a small and larae chain The 

The small chain has a GATase domain in the carboxyl terminus 

See GATase. 

Number of members: 46 

f?n!!l nh.t^^'^'T^ « J"^^^ '^"'^^'^ ^^^^^^^^ ATP-dependent synthesis of carbamyl-phosphate 

urea cycle and the biosynthesis of arginine and pyrimidines 

te'I sLh''irEi"h^^^^^^^^^ ''^ °f PV""''^'"«« purines. In bac- 

teria such as Eschericha coli, a single enzyme is Involved in both biosynthetic pathways while other bacteria have 
separate enzymes. The bacterial enzymes are formed of two subunits. A small chain (ger^e Ta) tha oroides 
glutamine amidotransferase activity (GATase) necessan^ for removal of the ammonia grouTfrom ^Itr^d a 

genefcTAfarcSr^^ 

^enes CPA1 and CPA2). In most eukaryotes. the first three steps of pyrimidine biosynthesis are catalyzed bv a larae 
muHrfunctK.na enzyme - called URA2 in yeast, rudimentary in Drosophite and CAD in mammaM2] ?he CPsIse 

ar^rrstrS"^^^^^^^ 

f^tla^o-S-ISf^^^^^^^^^^^^ 

If ?^ . ^^^^^^ ^^'"^"^ f^as a"sen from the duplication of an ancestral subdomain 

Of about 600 ammo acids. Each subdomain independently binds to ATP and it is suggested thTt^^e to iollZ^ 

crarraSyrrptr"^~'"°"°"^^^^ 

Kse"!^ C 64^2^ rAcST'" TnT'^'l ^ '''^""^^P^^dent enzymes acetyl-CoA car- 
fpcj and I^ea'ca'rioZe (EC^^^^^^^^ "^'"'^'"^ ' ''^^"^^'^ ^"^^'^^^ ^■^•^ ^) 

fSeLIrnreHoM^^^^^^^^^^ 

- Consensus pattern: [FYV]-{PSJ- [LIVMC]-(LIVMA]-[LIVM]-[KR]-[PSA]-[STA]-x(3)-[SGl-G-x-rAGl 

- Consensus pattern: (LIVMF]-[LIMN]-E-[LIVMCA]-N-[PATLIVM]-[KR1.[LIVMSTAC] 

[IjSimrrierJ.P. Kelly R.E.. Rinker A.G. Jr.. Scully J.L. Evans D.R. J. Biol. Chem. 265:10395-10402(1990) 
1 2] Davidson J.N.. Chen K.C.. Jamison R.S.. Musmanno LA.. Kern C.B. BioEssays 15:157-164(1993). 

[0321] 87. CARL_TRIO (CRAL/TRIO domain) 
[1] 

Medline: 98121119 

Costal structure of the Saccharomyces cerevisiae phosphatidylinositol-transfer protein 

Sha B, Phillips SE, Bankaitis VA, Luo M; 
Nature 1998;391:506-510. 

J^09°iitl«?r f^fL^T""'^ '° «»"^<=t"re o' Sec14 Swiss- 

Nlber ofme^rs 3?^^ 

[0322] 88. CSD fCold-shoclc'DNA-binding domain) 

Medline: 94255482 
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Crystal structure of CspA. the major cold shock protein of Escherichia coll. 
Schindelin H, Jiang W. Inouye M, Heinemann U; 
Proc Natl Acad Sci U S A 1994;91:5119-5123. 
Number of members: 121 

ELl^TprS °* !° ^"""^ P~^'y°"^ ^"^'yo'"^ DNA-binding 

proteins (1 .2.3.E1]. This domain, which is known as the 'cold^hockdomain-CCSD) is present in the proteins listed below 

" fn'S^lf h' T^'I "^^^1 ^^'"^ "P'^' '^'"^ '«^P°"^« ♦° temperature (coW^hock protein) 
of gy^ binds to and stimutetes the transcription of the CCAAT-containing promoters of the HN-S protein and 

- Mammalian Y box binding protein 1 (YB1 ). A protein that binds to the CCAAT-containing Y box of mammalian HLA 

Class II Qenes. 

■ SnX'hsp^J gTnes.' '''' '° CCAAT-containing Y box of 

■ XenopusBboxbindingprotein(YB3).YB3blndstheBboxpromoterelementofgenestranscribedbyRNApolymer- 

ase III. f 7 

' f ' ^"^""'^ A (EFI-A) (dbpB). A protein that also bind to CCAAT-motif in various gene promoters 

- DopA, a Human DNA-binding protein of unknown specificity. 

- Bacillus subtilis cold-shock proteins cspB and cspC. 
Streptomyces clavuligerus protein SC 7.0. 

- Escherichia coli proteins cspB, cspC, cspD, cspE and cspF. 

' ^ ^"^^^d upstream of the N-ras gene. Unr contains nine repeats that are similar to the 

ObD domain. The function of Unr is not yet known but It could be a multivalent DNA-binding protein. 

K?i!L f ""^^ <^onserved region which is located In its N-terminal section 

has been selected. It must be noted that the 5>ouuuri 

beginning of this region is highly similar [4] to the RNP-1 RNA-binding motif. 

- Consensus pattern: [FY]-G.F-|.x(6 J).[DER]-[LIVM]-F.x-H.x-[STKR]-x-[LIVMFY) 

[ 1] Donlger J., Landsman D., Gonda M.A., Wistow G. 

New Biol. 4:389-395(1992). 
[ 2] Wistow G. 

Nature 344:823-824(1 990). 
[ 3] Jones RG., Inouye M. 

Mol. Microbiol. 11:811-818(1994). 
[ 4] Landsman D. 

Nucleic Acids Res. 20:2861-2864(1992). 

[0325] 89. CTF_NFI (CTF/NF-I family) 
Number of members: 45 

Sc^ "^""T 'T'! ^^^''^ °' ^^^^ bo'^-binding transcription factor (CTF) [1 .2] (also known as TGGCA-blnding 
proteins) are a family of vertebrate nuclear proteins which recognize and bind, as dimers, the palindromic DNA se- 
o,"dma f -;°^.^A'^,'^N;«CCA-3-. CTF/NF-I binding sites are present In viral and cellular promoters andTthe origfn 
of DNA replication of Adenovirus type 2. •■■"•<» unyii i 

S f ^7/!!''"' ""^'^ '^^"""^'^ ""'^'^^^ '^'^'"^ '■ ^ °< P'oteins that activate the repll- 

TZ rr^r ^«^°^P«^ NF-II and NF-III) [3]. The faml^ of proteins was also identified 

as the CTF transcription factors, before the NFI and CTF families were found to be identical [4]. The CTF/NF-I proteins 
dubbed as N^i! NF^orUF^^''''^''"^ transcription and DNA replication. The CTF/NF-I family name has al^ been 
[0328J In a given species, there are a large number of different CTF/NF-I proteins. The multiplicity of CTF/NF-I is 

S^'roenesrvr*'' T"" — of four different genes.^The^^^'^L of 

NF-I genes have been classified as: 

The CTF-like factors subfamily (prototype form: CTF-1 ) [4] 

- The NFI-X proteins. 
The NFI-A proteins. 

- The NFI-B proteins. 
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K ^^"^^ ^PP^^' ^^^^ transcription and replication activities. 

[0330] CTF/NF-1 proteins contains 400 to 600 amino acids. The N-tenninal 200 amino-acid sequence, almost per- 
S^TI'hma'" ^ .^P^'^^^"^ sequenced, mediates site-specffic DNA recognrtion, protein dimerization and 
Adenovirus DNA replication. The Ctemi.nal 100 amino acids contain the transcriptional activation domain This acti- 

transcnption factors and with histone H3 [6]. 

[0331] A perfectly conserved, highly charged 1 2 residue peptide located in the N-terminal part of CTF/NF-I has been 
selected as a specific signature for this family of proteins. 

- Consensus pattern: R-K-R-K-Y-F-K-K-H-E-K-R 

[ II Memriod N., O'Neill E.A.. Kelly TJ.. Tjian R. 

Cell 58:741-753(1989). 
[ 2] Rupp R.A.W.. Kruse U., Multhaup G., Goebel U., Beyreuther K 

Sippel A.E. 

Nucleic Acids Res. 18:2607-2616(1990). 
[ 3] Nagata K.. Guggenheimer R.A., Enomoto T, Lichy J.H., Hunwitz J 

Proc. Natl. Acad. Sci. U.S.A. 79:6438-6442(1982). 
[ 4] Santoro C, Mermod N., Andrews PC, Tjian R. 

Nature 334:2118-2224(1 988). 
[ 5] Gil G., Smith J.R., Goldstein J.L, Slaughter C.A., Orth K., Brown M.S., Osbome TF 

Proc. Natl. Acad. Sci, U.S. A 85:8963-8967(1988). 
[ 6] Alevizopoulos A.. Dusserre Y. Tsai-Pflugfelder M.. von der Weld T. Wahli W Mermod N 

Genes Dev 9:3051 -3066(1 995). 

[0332] 90. Calsequestrin (Calsequestrin) 
Number of members: 1 3 

[0333] Calsequestrin is a moderate-affinity, high-capacity calcium-binding protein of cardiac and skeletal muscle [1] 
k"J^^ ? ^^^'^ sarcoplasmic reticulum terminal cisternae. Calsequestrin acts as a 

^nZ^ '""^"^"^ '^'^ excrtation<ontraction coupling. It is a highly acidic protein of 

H^rnrl?? ^"''"^^^"f ^^^'^"^^ binds more than 40 moles of calcium per mole of protein. There are at least two 

ffrhllT-'M "^'^.f^"^^*""^ ^h'^h ^^P^^^^d in cardiac muscles and another in skeletal muscles. Both 
forms have highly similar sequences. 

[0334] Two signature sequences have been developed The first corresponds to the N-temilnus of the mature protein 
the^econd is located just in front of the C-terminus of the protein which is composed of a highly acidic tail of variable 



Consensus pattern: [EQ]-[DE]-G-L-[DN]-F-P-x-Y-D-G-x-D-R-V 

Consensus pattern: [DE]-L-E-D-W-[LIVM]-E-D-V-L-x-G-x-[LIVM]-N-T-E-D-D-D 



[0335] [ 1] Treves S., Vilsen B.. Chiozzi R, Andersen J.R, Zorzato F 

Biochem. J. 283:767-772(1992). 
[0336] 91 . Carboxyljrans (Carboxyl transferase domain) 

[1] 

Medline: 93374821 

Primary structure of the monomer of the 1 28 subunit of transcarboxylase as deduced from DNA and characterization 
of the product expressed in Escherichia coll. 

Thornton CG, Kumar GK, Haase FC. Phillips NF, Woo SB, Park VM, 

Magner WJ. Shenoy SC. Wood HG. Samols D; J Bacterid 1 993; iVs sSOl -5308 
[2] 

Medline: 93358891 
Molecular evolution of biotin-dependent carboxylases. 

Toh H, Kondo H, Tanabe T; 

Eur J Biochem 1993;215:687-696. 
AH of the members in this family are biotin dependent carboxylases 

The carboxyl transferase domain carries out the following reaction; transcarboxylation from biotin to an acceptor mol- 

r ^P^^ transferase. One of them uses acyK^oA and the other uses 2-oxo 

acid as the acceptor molecule of carboi dbxide. 
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All of the members in this famiiy utilise acyl^oA as the acceptor molecule 
Number of members: 47 

[0337J 92. ChaLstiLsynt (Chalcone and stilbene synthases) 
Number of members: 146 

103«I| In addition to tfe plant snzymes, this lamily also Includss Bacillus stibtllls bcsA 



[ 1] Schroeder J., Schroeder G. 

2. Naturforsch. 45C: 1-8(1 990). 
[ 2] Lanz T. Tropf S., Mamer F.-J.. Schroeder J.. Schroeder G 

J. Biol. Chem. 266:9971-9976(1991). 

[0341] 93. Chorismate.synt (Chorismate synthase) 
Number of members: 19 



[0343] Chorismate synthase from various sources shows ri 9i a hinh Hi,r,r^^ »^ 

of about 360 to 400 amino-acid residues Thr«n cio„!t, « ^ °' conservation. It is a protein 

..s..„..,™st,a,Tnrrss==~*jrr^^^^^ 

- Consensus pattem; G-E-S-H-[GC]-x(2)-[LIVMHGTV]-x-[LIVMJ(2)-[DE]-G-x-rPV1 

- Consensus pattem: [GE]-R-[SA](2).[SAG].R-[EV].[ST].x(2).[RH]-U^^^^ 

- Consensus pattem: R-[SH]-D-[PSV]-[CSAVl-x(4)-[GAI].x.[IVGSP].[LIVM].x-E-[STAH]-^ 

[ 1] Schaller A., Schmid J.. Leibinger U., Amrhein N. 

J. Biol. Chem. 266:21434-21438(1991). 
[ 2] Jones D.G.L, Reusser U., Braus G.H. 

Mol. Microbiol. 5:2143-2152(1991). 

[03441 94. Clat_adaptor_s (Clathrin adaptor complex small chain) 
Number of members* 21 

ar,,,„o»,,sad^to;o,cS*13;rotl~pS^ 

the ptesma membrane. Both AP-1 and AP-2 areSStll ?' ^''^^ ^^'^'^ ^"^^ 
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[0347] A conserved region in the central sectbn of these proteins has been selected as a signature pattern. 

- Consensus pattern: [LIVM](2)-Y-[KR]-x(4)-L-Y-F 

5 [ 1] Pearse B.M.. Robinson M.S. 

Annu. Rev. Cell Biol. 6:151-171(1990). 
[ 2] Kirchhausen T, Davis A.C., Frucht S., O'Brine Greco B., Payne G.S., 
Tubb B. 

J. Biol. Chem. 266:11153-11157(1991). 
10 [ 3] Nakai M.. Takada T, Endo T. 

Biochim. Biophys. Acta 1174:282-284(1993). 
[ 4] Phan H.L. Flnlay J.A.. Chu D.S.. Tan RK., Kirchhausen T. Payne G.S. 

EMBO J. 13:1706-1717(1994). 
[ 5] Kuge O.. Hara-Kuge S.. Orel L, Ravazzola M., Amherdt M., Tanigawa G.. 
IS Wieland FT, Rothman J.E. 

J. Cell Biol. 1231727-1734(1993). 

[0348] 95. Clathrln_lg_ch (Clathrin light chain.) 
Number of members: 8 

20 [0349] Clathrin [1.2] Is the major coat-forming protein that encloses vesicles such as coated pits and forms cell 
surface patches involved in membrane traffic within eukaryotic cells. The clathrin coats (called triskelions) are com- 
posed of three heavy chains (180 Kd) and three light chains (23 to 27 Kd). 

[0350] The clathrin light chains [3], which may help to properly orient the assembly and disassembly of the clathrin 
coats, bind non-covalently to the heavy chain, they also bind calcium and interact with the hsc70 uncoating ATPase. 

25 

In higher eukaryotes two genes code for distinct but related light chains: LC(a) and LC(b). Each of the two genes 
can yield, by tissue-specific alternative splicing, two separate forms which differ by the insertion of a sequence of 
respectively thirty or eighteen residues. There Is, In the N-termlnal part of the clathrin light chains a domain of 
twenty one amino acid residues which is perfectly conserved in LC(a) and LC(b). 
30 - In yeast there is a single light chain (gene CLG1) whose sequence is only distantly related to that of higher eu- 
karyotes. 

[0351] Two signature patterns have been developed for clathrin light chains. The first pattern is a heptapeptide from 
the center of the conserved N-terminal region of eukaryotic light chains; the second pattern Is derived from a positively 
35 charged region located in the C-temninal extremity of all known clathrin light chains. 

Consensus pattern: F-L-A-Q-Q-E-S 

[1]Keen J.H. 
^0 Annu. Rev Biochem. 59:415-438(1990). 

[2] Brodsky FM. 

Science 242:1396-1402(1988). 
[ 3] Brodsky FM., Hill B.L, Acton S.L, Naethke I., Wong D.H., 
Ponnambalam S., Parham P. 
45 Trends Biochem. Scl. 16:208-213(1991). 

[0352] 96. (Clathrin repeat) 7-fold repeat in Clathrin and VPS 

Each repeat is about 140 amino acids long. The repeats occur in the arm region of the Clathrin heavy chain. 
Number of members: 79 

50 [1] 

Medline: 92191269 

Folding and trimerization of clathrin subunits at the triskelion hub. 
Nathke IS, Heuser J, Lupas A. Stock J, Turck CW, Brodsky FM; 
Cell 1992;68:899-910. [2] 
55 Medline: 88097376 

Clathrin heavy chain: molecular cloning and complete primary structure. 
Kirchhausen T, Harrison SC, Chow ER Mattallano RJ, 
Ramachandran KL, Smart J, Brosius J; 
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Proc Natl Acad Sci U S A 1987;84:8805-8809. 
[0353] 97. Collagen (Collagen triple helix repeat (20 copies)) 
[1] Medline: 94059583 
New members of the collagen superfamily 
Mayne R. Brewton RG; 
CurrOpin Cell Biol 1993;5:883-890. 
Scurvy is associated with collagens. 
Members of this family belong to the collagen superfamily [1]. 

Collagens are generally extracellular structural proteins involved in formation of connective tissue structure. 
The alignment contains 20 copies of the G-X-Y repeat that forms a triple helix. The first position of the repeat is glycine, 
the second and third positions can be any residue but are frequently proline and hydroxyproline. Collagens are post 
translationally modified by proline hydoxylase to form the hydroxyproline residues. Defective hydroxylation is the cause 
of scuwy. 

Some members of the collagen superfamily are not involved in connective tissue structure but share the same triple 

helical structure. 

Number of members: 2125 

[0354] 98. Coprogen_oxidas (Coproporphyrinogen III oxidase) 
Number of members: 12 

Coproporphyrinogen III oxidase (EC 1.3.3.3) (coproporphyrinogenase) [1.2] catalyzes the oxidative decarboxylation 
of coproporphyrinogen III into protoporphyrinogen IX, a common step in the pathway for the biosynthesis of porphyrins 
such as heme, chlorophyll orcobalamin. 

[0355] Coproporphyrinogen III oxidase is an enzyme that requires iron for its activity A cysteine seems to be important 
for the catalytic mechanism [3]. Sequences from a variety of eukaryotic and prokaryotic sources show that this enzyme 
has been evolutionarily conserved. A highly conserved region in the central part of the sequence has been selected 
as a signature pattern. This region contains the only conserved cysteine and is rich in charged amino acids. 

- Consensus pattem: K-x-W.C-x(2)-[FYH](3)-[LlVM]-x-H-R-x-E-x-R-G-[LIVM].G-G-[LIVMl-F-F-D 

[1]Xu K., EllinttT 

J. Bacteriol. 175:4990-4999(1993). 
[ 2] Kohno H„ Furukawa T, Yoshinaga T, Tokunaga R.. Taketani S. 

J- Biol, Chem. 268:21359-21363(1993). 
[ 3] Camadro J.M.. Chambon H., Jolles J., Labbe P 

Eur. J. Biochem. 156:579-587(1986). 
[4] Xu K.. Elliott T 

J. Bacteriol. 176:3196-3203(1994). 

[0356] 99. Corona_nucleoca (Coronavirus nucleocapsid protein) 
[1] 

Medline: 98087828 
Identificatbn of a specific interaction between the 
coronavirus mouse hepatitis virus A59 nucleocapsid protein 
and packaging signal. 
Molenkamp R, Spaan WJ; 
Virology 1997;239:78-86. 
Number of members: 44 
[0357] 100. Cu-oxidase (Multicopper oxidase) 
II] 

Medline: 90126844 

The blue oxidases, ascorbate oxidase, laccase and ceruloplasmin. 
Modelling and structural relatic^ships. 
Messerschmidt A, Huber R; 
Eur J Biochem 1990;187:341-352. 
Number of members: 150 

[0358] Multicx>pper oxidases [1 ,2] are enzymes that possess three spectroscopically different copper centers. These 
centers are called: type 1 (or blue), type 2 (or normal) and type 3 (or coupled binuclear). The enzymes that belong to 
this family are: 
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- Ascorbate oxidase (EC 1.10.3.3). a higher plant enzyme 

■ areat v«rir'"f ' '•^ ' 'i ^ P^°»«'" the serum of mammab and birds which oxidizes a 

- BlocxJ coagulation factor V (Fa V). 

- Blood coagulation factor VIII (Fa VIII) [El]. 

- Yeast FET3 [3], which is required for ferrous iron uptake 

- Yeast hypothetical protein YFL041wand SpAC1F7.08. the fission yeast homolog 

■ Consensus panem; 0.«-IFYWl.x.|UVMP™n-x-lCSTfx(8)-0-|LMl-x(3HUVMPrTm 

" srrrr M?„"~vsT ^ ' ^ 

[0362] 101. Cullin (Cullin family) 
Number of members: 24 

[0363] The following proteins are collectively termed cullins [1 ]: 

; ^rsxsrr^rrnrr^^^^^ 

- Fissbn yeast hypothetical protein SpAC24H6.03 

■ Consensus pattem: [LIV]-K-x(2)-[LIV]-x(2)-L-l-[DEQ]-[KRHNQ]-x.Y-[LIVMhx-R-x(^ 

[ 1] Kipreos E.T.. Lander L.E., Wing J.R, He W.W., Hedgecock E M 
Cell 85:829-839(1 996). 

[ 2J Burnatowska-Hledin M.A.. Spielman W.S.. Smith W.L., Shi R, Meyer J M 

DewittD.L ^ * " 

Am. J. Physiol. 268:f1198-Fl210(1995). 
[ 3] Mathias N,. Johnson S.L., Winey M.. Adams A.E.. Goetsch L, Prtngle J R 

Byers B., Goebl M.G. ^ ' 
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Mol. Cell. Biol. 16:6634-6643(1996). 

[0365] 102. (Cu_amine_oxid) 
Copper amine oxictese signatures 

^TlnlSf ^Vl^^ ^'^ '"T^' "^'^'y'" °' ^ '^^^ ^«"9« °' ''*°9e"*<= amines including many 

neurotransmitters, histamuie and xenobiotic amines. There are two classes of amine oxidases: flavin-containing (EC 
1 .4.3.4) and copper-containing (EC 1 .4.3.6). 

[0366] Copper-containing AO is found in bacteria, fungi, plants and animals. It Is an homodimeric enzyme that binds 
one copper ion per subunit as well as a 2.4.5- trihydroxyphenylalanine quinone (or topaquinone) (TPQ) cofactor This 
cofactor IS derived from a tyrosine residue. «;v-«iav,iui. uir, 

[Oasp Two sigrjature patterns were derived for copper AO. the first one contains the tyrosine which give rises to the 
TPQ cofactor while the second one contains one of the three histidlnes that bind the copper atom [2] 
[0368] Consensus pattem[LIVM]-(UVMA]-[LIVMn-x(4)-[STl-x(2)-N-Y-[DE]-lYN] [The first Y gives rises to TPQ] Se- 
quences known to belong to this class detected by the pattemALL 

K thSTT. ^TT. f°S]-x(2)-H-[LIVMF]-x(3)-E-[DE]-x-P (H is a copper ligand] Sequences known to be- 
long to this class detected by the pattern ALL. except for lentil AO. 

Dekker° New-Y^k^gS ^'"^ - ^'^^^ 

[ 2] Parsons M.R (^nvery M.A.. Wilmot CM.. Yadav K.D.S.. Blakeley V.. Comer A.S.. Phillips S.E. V., McPherson 
M.J.. Knowles PR Structure 3:1171-1184(1995). 

[0370] 103. Cys-protease (Cysteine protease) 
Number of members: 358 

SiL r"f 7°"" ^TT^^ ^-^-^^ ^ '^'"''y °' Which contain an active site 

cysteine. Catalysis proceeds through a thioester intermediate and is facilitated by a nearby histidine side chain an 
asparagine completes the essential catalytic triad. The proteases which are currently known to betong to this family 
are listed below (references are only provided for recently determined sequences). 

- Vertebrate lysosomal cathepsins B (EC 3.4.22.1). H (EC 3.4.22.16). L (EC 3.4.22.15), and S (EC 3 4 22 27) 121 

- Vertebrate lysosomal dipeptidyl peptidase I (EC 3.4.14.1) (also known as cathepsinC) [21 

• Vertebrate calpains (EC 3.4.22.17). Calpains are intracellular calcium-activated thiol protease that contain both a 
N-terminal catalytic domain and a C-terminal calcium-binding domain. 

• Mammalian cathepsin K, which seems involved in osteoclastic bone resorption [3] 

- Human cathepsin O [4J. 

- Bleomycin hydrolase. An enzyme that catalyzes the Inactivation of the antitumor drug BLM (a glycopeptide) 

EC 3.4.22. 14); papaya latex papain (EC 3.4.22.2). chymopapain (EC 3.4.22.6), caricain (EC 3.4.22.30), and pro- 
o^t!! ^ 3 4.22.25); pea turgor-responsive protein 15A; pineapple stem bromelain (EC 3.4.22 32)- rape 

Vo^off ^"^ 'ow-temperature induced, Arabidopsis thaliana A494. RD19A 

anci RD21 A. 

- House-dust mites allergens DerP1 and EurMI. 

■ «f S-^i" proteinases from the womis Caenortiabditis elegans (genes gcp-l. cpr-3. cpr-4. cpr-5 and cpr- 
6) Schistosoma mansoni (antigen SM31) and Japonica (antigen SJ31). Haemonchus contortus (genes AC-I and 
AC-2). and Ostertagia ostertagi (CP-1 and CP-3). 

- Slime mold cysteine proteinases CP1 and CP2. 

- Cruzipain from Trypanosoma cruzi and brucei. 

- Throphozoite cysteine proteinase (TCP) from various Plasmodium species. 

- Proteases from Leishmania mexicana. Theileria annulata and Theileria parva. 

- Baculoviruses cathepsin-like enzyme (v-cath). 

- Drosophila small optic lobes protein (gene sol), a neuronal protein that contains a calpain-like domain 

- Yeast thiol protease BLH1/YCP1/LAP3 

- Caenorhabditis elegans hypothetical protein C06G4.2. a calpain-like protein. 
[0372] Two bacterial peptidases are also part of this family: 

- Aminopeptidase C from Lactococcus lactis (gene pepC) [5]. 
Thiol protease tpr from PorphyronrK>nas gingivalis. 
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[0373J Three other proteins are structurally related to this family, bu, rr^y have lost their pioteo^ic activity. 

bT lle'^t' t? r/hTuS r be''^'^ 'r*'^^^^'" ^ ^'"^ .he^ctS cysteme is rep^ced 
<PDOcJoi2>r ^'^'^ ^ "-IMKlomain protein (sel 

[ 1] Dufour E. 

Biochimie 70:1335-1342(1988). 
[ 2] Kirschke H., Barrett AJ., Rawlings N.D. 

Protein Prof. 2:1587-1643(1995). 
( 3] Shi G.-R, Chapman H.A.. Bhairi S.M.. Deleeuw C, Reddy V Y Weiss S J 

FEBS Lett. 357:129-134(1995). ^ ' 

[ 4] Velasco G.. Ferrando A. A.. Puente X.S., Sanchez LM.. Lopez-Otin C 

J. Biol. Chem. 269:271 36-271 42(1 994) 
[ 5] Chapot-Chartier M.R. Nardi M.. Chopin M.C.. Chopin A.. Gripon J C 

Appl. Environ. Microbiol. 59:330-333(1993). 
[ 6] Higgins D.G., McConnell D.J.. Sharp P.M. 

Nature 340:604-604(1989). 
[ 7] Rawlings N.D.. Barrett A.J. 
Meth. Enzymol. 244:461-486(1994). 

J Mol Biol 1 996:262:202-224. 
[1] Medline: 99059720 

""^SulrTHT 1%''?"^ gamma-synthase at 1 .5 A resolution 

Clausen T. Huber R. Prade L. Wahl MC. Messerschmidt A" 
EMBO J 1 998; 1 7:6827-6838. 
Database Reference: SCOP; Icsl; fa; [SCOP-USA][CATH-PDBSUMl 

Th,s fam,^ includes enzymes involved in cysteine and methionine metabolism. The following are members: 

Cystathionine gannma-lyase, 
Cystathionine gamma-synthase. 
Cystathionine beta-lyase, 
Methionine gamma-fyase, 
OAH/OAS sulfhydrylase, 
O-succinylhomoserine sulphhydrylase 

All of these members participate is slightly different reactions 

AH these enzymes use PLP (pyridoxal-5'-phosphate) as a cofactor 

Number of members: 52 
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(gene metC). ""'^ ^'^P ^'^^ biosynthesis of methionine from cysteine in bacteria 

■ ItSieiXxSu'S^^^^^^^ '^^'^^ -^'V"^ the t,ansfom,a,ion of methionine Into 

■ ~to:s:^^^^^^ 

- O-succinylhomoserine sulfhydrylase (EC 4 2 99 -) 

- Yeast hypothetical protein YGL1 84c. 

- Yeast hypothetical protein YHRli 2c. 

u.sd as a =i9„a,„,e panem » d«M rSalnZo? 

■ SSra'iSiS""™^-*'''*^"^"^^'^^™^^^^ 

[037q 105. Cyt_reductase 
FAD/NAD-binding Cytochrome reductase 
Number of members: 60 
[1] Medline: 95111952 

Lu G. Campbell WH. Schneider Lindqvist Y" 
Structure 1994;2:809-821. 
[2] Medline: 92084635 

J Biol Chem 1991;266:23542-23547 
[0379] 106. Cytidylyttrans 
Phosphatidate cytidylyltransferase 
Number of members- 21 

■ Con-au, pan,.: =■x^™F^K.R.x,4,^^,.,esA^.B„UHP0^x.H..^^UV^^,.D.R^L«^^^ 

[ 1J Sparrow C.R, Raetz C.R.H. 

J. Biol. Chem. 260:12084-12091(1985) 
[ 2] Shen H.. Heacock P.N., Clancey C. J.. Dowhan W. 

J. Bfol. Chem. 271 :789-795(1 996). 
[ 3] Saito S.. Goto K., Tonosaki A., Kondo H 

J. Biol. Chem. 272:9503-9509(1997). 
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Number of members: 64 

SicAMP rrMpSi'''? 7='°,°!,'^«-b.'"d'n9 d°™in signatures and profile Proteins that bind cyclic nude- 

oiiaes (CAMP or cGMP) share a structural doma nof about 120 residues n.qi The* hoot ^t..^;«^ * *u 

Both types of kinases contains two tandem copies of the cyclic nucleotide-binding domain The cAPK^ are com^r.^; 

First consensus pattern: [LIVM]-[VIC]-x(2)-G-[DENQTA].x-[GAC]-x(2HLI VMFY](4)-x(2).G 

Second consensus pattern: IUVMF]-G-E.x.[GASI-[LI VM]-x(5J1)-R-[STAQ]-a4l,VM^^ [STACVJ. 

[ 1] Weber IT., Shabb J,B., Corbin J.D. Biochemistry 28:6122-6127(1989) 

[ 2] Kaupp U.S. Trends Neurosci, 14:150-157(1991). 

[ 3] Shabb J.B., Corbin J.D. J. Biol. Chem. 267:5723-5726(1992). 

[0384] 109. (cadherin) 

Cadherins extracellular repeated domain signature 

Epithelial (E-cadherin) (also known as uvomorulin or L-CAM) (CDH1) 
Neural (N-cadherin) (CDH2). 
Placental (P-cadherin) (CDH3). 
Retinal (R-cadherin) (CDH4). 
Vascular endothelial (VE-cadherin) (CDH5). 
Kidney (K-cadherin) (CDH6). 
Cadherin-8 (CDH8). 
Osteoblast (OB-cadherin) (CDH11). 
Brain (BR-cadherin) (CDH12). 
T-cadherin (truncated cadherin) (CDH13). 
Muscle (M-cadherin) (CDH14). 
Liver-intestine (U-cadherin). 
EP-cadherin. 

a«.cM«m1Mng "S"" " «*«'"™ » In «!= extracellular lepeete suggested that 
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Desmoglein 1 (desmosoma! glycoprotein I). 
Desmoglein 2. 
- Desmoglein 3 (Pemphigus vulgaris antigen). 

Sic "f"^*"'^ ^^T"" developed for the repeated domain is located in it the C-terminal extremity 

wh«:h .s ,ts best conserved region. The pattern includes two consented aspartic acid residues as well as nTasS 
agines; these residues could be implicated in the binding of calcium ^ 

thfreT. t , ^ ~P'«^ °' ♦^'^ ^«P««»«d domain. In the third ^py 

there IS a deletion of one residue after the second conserved Asp. """ucopy 

[ 1] Takeichi M. Annu. Rev Biochem. 59:237-252(1990). 
[ 2] Takeichi M. Trends Genet. 3:213-217(1987). 

[ 3] Mahoney P.A.. Weber U.. Onofrechuk P.. Biessmann H.. Bryant P.J., Goodman C.S. Cell 67:853-868(1991). 
[0390] 1 10. Calreticulin family signatures 

Pn mi^2li^Sl!^rH" '^'^^^^^ " high^apacitycalcium-binding protein which is present 

h nT2.^t nf^lf T^^i * "P^^'y °* endoplasmic (ER) and the sarcoplamic reticulum (SR)membranes 
uncw^? r T"^^' °' ^'^ SR and it may well have otier important 

unctions. Structurally, calreticulin is a protein of about 400 amino acid residues consisting of three domains a) An N 

reZe Th ^- fl"": °' "'^'^ '^^''^"^^ (^-domain); b) A central domain o atuJ 70 

residues (P-doma.n) which contains three repeats of an acidic 17 amino acid motif. This region binds calcium wi^h a 
low-capacity, but a high-affinity; c) A C-temiinal domain rich in acidic resklues and In lysine (C-domainTThTreJon 
binds calcium wrth a high^^apacity but a tow-affinlty. Calreticulin is evolutionary related to the fSg protet?r- 
Onchocerca volvulus antigen RAL-1 . RAL-1 is highly similar to calreticulin. but presses a C-ter,^lS^3S in 
ysine and arg^ine and lacks acidic residues and is therefore not expected to bind calcium in th^ rei^^ CaLx n 

o ptey a major ro e in the qualrty control apparatus of the ER by the retention of incorrect^, folded proteins - Calmedn 
Lee^ devrirj:;;? ^'^TTT' ^^^^^'^-^^^^^^ P^°'-" ^^^gmy simnar to calnexin. Three signature pattemThave 

thTd nZ °' "^^ ^° P^"""^^ "^"^^ °" ^«9ions in the 5<Jomain- the 

third pattern corresponds to positions 4 to 16 of the repeated motif in the P-domain 

Consensus pattern: [KRHN]-x-[DEQNl-[DEQNK]-x(3)-C-G-G-[AGl-(FY]-[LIVM1-[KNl-(LIVMFYl(2)- 

Consensus pattern: [LIVMK2)-F-G-P-D-x-C-[AG]- 

Consensus pattern: [IVl-x-D-x-[DENST]-x(2)-K-P-[DEH]-D-W-[DEN]- 

[ 1] Michalak M.. Milner R.E., Bums K., Opas M. Biochem. J. 285-6ei -692(1 992) 

! S Sf'f'T ^""^'"^ D.B. Trends Biochem.' Sci. 19:124-128(1994) 

^ir??490994^ ""^"^ ""'''"^ ^' ^^'""^ - "^"^'^ "'"''"^""^ C^^^'"- 269: 

[0391] 111. Eukaryotic-type carbonic anhydrases signature (carb_anhydrase) 

c?rS?l^l;f noh? ^^""^^ '2.3.4] are zinc metalloen'zymes which catalyze the reversible hydration of 

^^^^T^J^l ^""^ evolutionao. rebted forms of carbonic anhydrase are current^ known to exist in 

mT^H H T membrane-bound forms (CA-IV and CA-VII) a 

mrtochondrial fom, (CA-V); a secreted salivary fomi (CA-VI); and a yet uncharacterized isozyme [5] ln the a^ 

^^'^T 'T^TnT '''' "^^^ sequenced[6]. They are periplasmic g^co^lins evS 

ti^an^related to vertebrate CAs. Some bacteria, such as Neisseria gonorrhoeae [7] also have a euka^^otic-type CA 
CAS contain a single zinc atom bound to three consented histldine residues. As a signature for CAs a patSn has 
been developed which includes one of these zinc-binding histidines. Protein D8 from N^ccinia ani otheV poiiruses is 

Ste oftSe' n'? T r-''"''"^ - -"^"y consented rei5~s L 

TdcS^sL T extracellular domain of some receptor-type tyrosine-pK^tein phosphatases (see 

So,"rZtTf '"^ f -E-[HNn-x-[LIVM]-x(4)-(FYH]-x(2)-E-[LIVMGA]-H-[LIVMFA]^ [The second H is a zinc ligand]- 
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[ 1] Deutsch H.F. Int. J. Bbchem. 19:101-113(1987) 
[ 2] Femley RT. Trends Biochem. Sci. 13:356-359(1988) 
I 3] Tashian R.E. BioEssays 10:186-192(1989) 
1 4] Edwards Y. Biochem. Soc, Trans. 18:171-175(1990) 



0 



f1998). auerbnkssonE..Ch.r.caL.L.ndsko9S.,JonssonB.H.2.3.C0.2.-J.Mni R.n. o«Un. 

[0392] 1 1 2. Caseins alpha/beta signature 

rapidly diverging family of proteins However t^o relnl' ! ' The alpha/beta caseins are a 

and the signal sequence, fhe sij a^urrp^ieThas ^en 3^^^ TT' °' P^°^P'^°'y'«t«d «e"ne residues 
eight residues of the signal sequence. devetoped for this family of proteins based upon the last 



Consensus pattern: C-L-[LV]-A-x-A-(LVF]-A- 
m Holt C, Sawyer L. Protein Eng. 2:251-259(1988) 
[0393] 113. Catalase signatures 



SSir^Jgenlis^^^ hydrogen peroxide to mo- 
organisms and in some prokaryoles catatese s a mol^l ^ '"^ °* ^^'^^^S^" Pe^o^^'^^ eukaryotic 
binds one protoheme IX group"^! conse-Xsin^^^^^^ "T^^ °' -''-^ 
residue has been used as a first signature ^rra^incluSL ^^9'°" "''^ 
binding. Aconserved histidlne has been shoS^ to beLoorn?.or?h«?r^^^^^^ participates in heme- 
around this residue has been selected as a sTcond sSCtt m "^'^ °' ^« -9'°" 
Consensus pattern: R-[LIVMFSTAN].F-[GASTNP]-Y-x D-fASTOQEHl rv ,h . . 
Consensus pattern: ['F]-x-IRHhx(4)-(E4R-x(2)-H-x(2HiS^ heme-binding ligand] 
Note: some prokaryotic catalases belong to tlle peroLSmf^TsIe^ ^^^^^ ^'^"^^ ^'^-J 

f2i:r.LlTa^r ™r^^^^^^^ M.G. MoI. BIoI. 152:465-499(1981) 

M.G. J. Mol. Bfol. 188;63-72(Tr6r' " ' '^''^ ' ' •^"'^^y M.R.N., Rossmann 

[3JvonOssowkil., HausnerG.. Loewen P.C. J. Mol. Evol. 37:71-76(1993). 

A r^L " J i^^"'" "'"^'"S) Chitin recognition or binding domain signature 

spSrro%™ rr^^^^^^^^ P-ns that have a common binding 

Of chitin subunits. It has been found in IhTproteTnS^^^^^ domain may be invoVed in the recognition or binding 
Characterized of these lectins are the thrXhrhS^otoc^u^^^^^^^^ °' non-leguminous pbnt lectins. The best 
N-acetylglucosamine/N-acetylneuraminic Lid bLinn?iT^ J^^^^^^ agglutinins (WGA-1. 2 and 3). WGA is an 
43 amino acid domain. The Lme type of^^^^^^^^^^^^^^ Sd TTtT'"''!" °' ^ ^«P^'»»" °' 

Plants endochitinases (EC 321l4Tfrom ctsfr^Itl p^A^^^^^^^ root-specific lectin as well as a rice lectin - 
thehydrolysisofthebeta-l.4i;Si^s^°N-a?^^^^^ Endochitinases are enzymes that catalyze 

against chitincontaininafuna.1 "l ^ZlTrrr'^P^'y-^^^^ as a defense 

at theirN-terminal extremity. AnLepLisaggTjfnSS 

two copies of the domain. - Hevein [5], a wound-irduTed Zeii^^^^^^ Urticadioica which contains 

two wound-induced proteins from potato. - KluCromyces S . °' "'"^'^ " ^'"^ ^"<^ ^"2. 

the linear plasmid pGKLI is comp^ed of threSuZ alt 1 T ^^^^ '"^^ by 

activity and inhibits growth of sensit Jy^^sti^ts 1 1' gi' „k ' , T"^^' '°'''n 

proteohrticallyprocessedfromalargerprecTrirSsoco^^^^^ ^'^^^ "^i^*^ 
In chitinases. as well as in the poLo «rnSnduce?orr^^ 

quenceandisthereforeattheNJem^^aToJ^enS ep'^^^^^^ 

section Of the protein. The domain contains eight cmslr^^^^'J^^u^nT ! " ^'^^^ '* '"e central 

to be involved in disulfide bonds. The toZ^ZTrSo^lZTZ "^'"^ ^^"^ ^" WGA. 

figure: + ++--I--..1111, arrangement of the four disulfide bonds is shown in the following 



figure: + ^,„„ 

xxCgxwoooocCxxxxCCsxxgxCgxxxxxCxxxCxxxxC 
volved in a disulfide bond.'*'.- position of the patter 



pattern. ' ' ' <=°"served cysteine in- 
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- consensus pattern: C-x(4.5)-C-C-S-x(2)-G-x.C.G-x(4)-[FYWhC [The five C's are involved in d^ulfide bonds] 

[1] Wright H.T.. Sandrasegaram G.. Wright C.S. J. Mo\. Evol. 33:283-294(1991) 
[ 2J Lemer D.R.. Raikhel N.V. J. Biol. Chem. 267:11085-11091(1992) 

[ 3] Butler A.R.. ODonnel R.W. Martin V.J.. Gooday G.W. Stark M.J.R. Eur. J. Biochem. 199:483-488(1991). 
[0395] 1 1 5. (Chitinase 1 ) Chitinases family 1 9 signatures 

Which IS probably involved in a disulfide bond ^nservea in most, if not all. of these chrtinases and 

Consensus pattern: C-x(4,5)-F-Y-[ST]-x(3HFYl-[U VMF].x-A-x(3)-[YF]-x(2)-F- [GSAl 
Consensus pattern: [LIVM].[GSA]-F-x-[STAG](2).[UVMFY].W-[FyW-[LIVM] 

[ 1] Flach J., Pilet R-E., Jolles R Experientia 48:701-716(1992) 
[ 2] Henrissat B. Biochem. J. 280:309-31 6(1 991 ). 

[0396] 1 1 6. chloroa_b-bind 

Chlorophyll A-B binding proteins. Number of members" 211 
[0397] 117. chromo 

b) Proteins with a single chromo domain. 

c) Proteins with paired tandem chromo domains. 

rn!?2 Currently, this domain has been found in the following proteins- 
[0399] Class A. 

- Drosophila heterochromatin protein Su(var)205 (HPl ). 

- Human heterochromatin protein HPl alpha 
Mammalian modifier 1 and modifier 2. 

- Fission yeast swi6. a protein invoK^ed* in the repression of the silent mating-type loci mat2 and mat3. 
[0400] Class B. 

- Drosophila protein Polycomb (Pc). 

- Mammalian modifier 3, a homolog of Pc. 

- Drosophila protein Su(var)3.9, a suppressor of position-effect variegation 

- Human Mi-2 autoantigen, characterisitic of dermatomyosis 

" s^hiir ^^^^rcissrr °~ '^^^ - - 

- Ftssbn yeast hypothetical protein SpACI 8G6.02c. 

• Caenorhabditis elegans hypothetical protein C29H1 2.6 

• Caenorhabditis elegans hypothetical protein 2K1236.2. 

- Caenorhabditis elegans hypothetical protein T09A5.8. 

[0401] Class C. 
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Mammalian DNA-binding/helicase proteins CHD-1 to CHD-4 
Yeast protein CHD1. 



10 



IS 



25 



[ 1] Paro R. Trends Genet, 6:416-421(1990) 

[ 3] Aasland R.. Stawart A.F. Nucleic Acids Res. 23:3168-3173(19951 
[ 4] Koonin E.V. Zhou S.. Lucchesis J.C. Nucleic Acids Res. 23:4229-4233(1995). 
[0403] 118.citrate_synt 

metal ion cefaclors. ^ carbon-carbon bond in the absence of 

identical chains. mnocnondnal matrix, the second is cytoplasmic. Both seem to be dimers of 

Sth^b^src^nLrd^'nt^^^^^^^ 

— Of the vertebrate mitochondrirrm:.7h 

- Consensus pattern: G-[FYA]-tGA]-H-x-flV]-x(1,2,-[RKT]-x(2,-D-tPS,-R (H is an active srte residue] 

K Im clpn' ' ^'^^^^""^"y 29:2213-2219(1990). 

Chaperonin cIpA/B 

^:rr.^sri??2,:?j:;L^ - — . » «... 

Number of members: 39 

are listed below. '2] 'o be evolutionary related. These proteins 

Yeast mitochondrial heat shock protein 78 (gene HSP78) 131 

- Porphyra purpurea chloroplast encoded cIpC. 

A and B motifs there are many parts in these two Tf ATP-binding site. In addition to the ATP-binding 

selected as signature patterns' 're firs^^^ ^-o of these regions have been 

Of the ATP-binding B motif. The second paSm 1^ l^SjftTe sio^^^ T '""''""^ '° 

motifs. ^ "^'®°'"'"®s^°"dt'omain in-between the ATP-binding A and B 

55 

- Consensus pattern: D-[AI].(SGA]-N-[LIVMF](2)-K-rPTl-x-L-xf2)-G 

- ^"--Pa«em:R-tLIVMPy,-D-x-S-E-(LIVMFy,.x-UR™ 



40 



45 
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1 1] Gottesman S.. Squires C, Pichersky E., Carrington M.. Hobbs M.. Mattick J.S.. Dalrymple B Kuramitsu H 
Shiroza T. Foster T. Clark W.P.. Ross B.. Squires C.L. Maurizi M.R. Proc. Natl. Acad. Sci. U.S.A. 87:351 3-3517 
(1 990). 

[ 2] Parsell D.A., Sanchez Y. Strtzel J.D.. Lindquist S. Nature 353:270-273(1991). 

[ 3] Leonhardt S.A., Fearon K.. Danese RN., Mason T.L Mol. Cell. Biol. 13:6304-6313(1993). 

[0410] 120. cofllin_ADF 
Cofilin/lropomyosin-type actin-binding proteins 
[1] 

Medline: 97290449 
Structure determination of yeast cofilin. 
Fedorov AA, Lappalainen P, Fedorov EV, Drubin DG. Almo SC; 
Nat Struct Biol 1 997;4; 366-369 
[2] 

Medline: 97290450 

Crystal stmcture of the actin-binding protein actophorin from Acanthamoeba. 
Leonard SA, Gittis AG, Petrella EC. Pollard TD, Lattman EE; 
Nat Struct Biol 1 997;4: 369-373 
[3] 

Medline: 97420794 

factof ^'"'''"^ uncoupled by mutation of conserved tyrosine residues in maize actin depolymerizing 

Jiang CJ, Weeds AG. Khan 8, Hussey PJ; 
Proc Natl Acad Sci U S A 1997;94:9973-9978 
[4] 

Medline: 97357155 

Cotllin promotes rapid actin filament turnover in vivo. 

Lappalainen R Drubin DG; 
Nature 19 97; 388; 78-82. 

Severs actin filaments and binds to actin monomers. 
Number of members: 44 

[0411] Actin-depolymerizing proteins sever actin filaments (F-actin) and/or bind to actin monomers, or G-actin thus 
preventing actin-polymerization by sequestering the monomers. The following proteins are evolutionary related and 
belong to a family of low molecular weight (1 37 to 1 66 residues) actin-depolymerizing proteins [1 .2.3.4]: 

- Cofilin from vertebrates, slime mold and yeast, Cofilin binds to F^actin and acts as a pH-dependent actin-deoo- 
lymerizing protein. ^ 

- Destrin from vertebrates. Destrin binds to G-actIn in a pH-independent manner and prevents polymerization 
Caenorhabdrtis elegans unc-60. 

Acanthamoeba castellanii actophorin. 
Plants actin depolymerizing factor (ADF). 

[0412] TTie most conserved region of these proteins is a twenty amino-acid segment that ends some 30 residues 
from their C-terminal extremity. This segment has been shown [5] to be important for actin-binding. 

- Consensus pattern: P-[DE]-x-[SA]-x.[LIVMTJ-[KR]-x-[KR]-M-[LIVMHYAHSTA](3).x(3)-[U 

[ 1] Hawkins M., Pope B., Maclver S.K,. Weeds A.G. Biochemistry 32:9985-9993(1993). 

[ 2] lida K.. Moriyama K.. Matsumoto S.. Kawasaki H., Nishida E.. Yahara I. Gene 124-115-120(1993) 

[ 3] Quirk S., Maclver S.K.. Ampe C. Doberstein S.K.. Kaiser D.A.. van Damme J.. V^ndekerckhove j" Pollard T 

D. Biochemistry 32:8525-8533(1993). 

[ 4] McKim K.S.. Matheson C. Marra MA, Wakarchuk M.F.. Baillie D.L Mol. Gen. Genet. 242" 346-357(1 994) 
[ 5] Moriyanra K.. Yc^iezawa N.. Sakai H., Yahara I.. Nishkla E. J. Biol. Chem. 267:7240-7244(1992). 

. l^L* (^^'^P'®^ ^"^^"^^ Respiratory-chain NADH dehydrogenase 24 Kd subunit signature Respiratory<hain 
NADH dehydrogenase (EC^a^ [1.2] (also known as complex! or NADH-ubiquinone oxidoreductase) is an olioo- 
meric enzymatic complex kjcated in the inner mitochondrial membrane which also seems to exist inthe chloroplast ^ 
in cyanobacteria (as a NADHi)Iastoquinone oxidoreductase). Among the 25 to 30 polypeptide subunits of this bioen- 
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sulfur (IP) fragment of the enzyme. It seems to bind a2Fe-2S iron-sulfur cluster. The 24 Kd subunit ^^ucCem^eS 

to [3 4). - Subunrt E of Eschenche coli NADH-ubiqulnone oxidoreductase (gene nuoE). - SubunH NQ02 of l^arac™ 
It inl^TrTL °'''"''''"J"°"' A highly conserved region 'located in !he central sec^n of mis suZrt 

Ta signTrp^Tm '''' "'^'"'^ ''^ "'"'^'"^ °' ^^^'^^ mentor has been s'^i 

- consensus pattern: D-x(2)-F-[ST]-x(5)-C-L-G-x-C-x(2) [GAJ-P [The two C's are putative 2Fe-2S ligandsj 
[ 1] Ragan C.I. Curr. Top. Bioenerg. 15:1-36(1987). 

[ 2] Weiss H., Friedrich T. Hofhaus G.. Preis D. Eur. J. Biochem. 197:563-576(1991) 

[ 3] Fearniey I.M., Walker J.E. Biochim. Blophys. Acta 1140-105-134(1992) 

[ 4] Weidner U.. Geler S.. Rock A.. Friedrich T. Leif H.. Weiss H. J. Mol. Bfol. 233:109-122(1993). 

[0414] 122. copper-bind 

Copper binding proteins, piastocyanin/azurin family 

Number of members: 70 

[041S] Blue or 'type-1' copper proteins are small proteins which bind a single copper atom and which are character 
are the plant chbropiastrc plastocyanins. which exchange electrons with cytochrome c6. and the distantfy r Jilted tec 

s Snr; T "^"^'"^^ ™^ '^"y °^ P^°*«'"« also includes ainCrote^s 

listed below (references are only provided for recently determined sequences). ^ 

' ^T^rf ^^'^^"^ ^"""^ Methylobacterium extorquens or Thiobacillus versutus that can grow on methvl- 
amine. Amicyanin appears to be an electron receptor for methylamine dehydrogenase 

- Blue copper protein from Alcaligenes faecalis. 

- Cupredoxin (CPC) from cucumber peelings [4]. 

- Cusacyanin (basic blue protein; plantacyanin. CBP) from cucumber. 

' p!!?Hr'" ^^"""^ P^^^^°"*^ [51 a membrane associated copper-binding protein 

- Pseudoazunn from Pseudomonas. h^-'icim- 

- Stellacyanin from the Japanese lacquer tree. 
Umecyanin from horseradish roots. 

■ roitraSiTi^d'c^^^^^^^^^^^ 

[0416] Although ttiere is an appreciable amount of divergence in the sequence of all these proteins the copper liaand 
s.tes are conserved and a pattern which includes two of the ligands (a cysteine and a histidine) has beeZvLoped 

- Consensuspattem:[GA]-x(0,2)-[YSAhx(0,1)-[^^^^^^ 

[2! ^d7n rG^Hu'r^TM^ol^^^^^^^^^^^^^ ' ^-^^1984, 

L'KXm°^'r3?^^^^^^ 'P " 

[4^ Mann K., Schaefer W, Thoenes U.. Messerschmidl A., Mehrabian Z.. Nalbandyan R. FEBS Lett. 314:220-223 

[ 5^ Mattar S.. Scharf B.. Kent S.B.H.. Rodewald K.. Oesterhelt D., Engelha«J M. J. Biol. Chem. 269:14939-14945 
( 6] Yano T. Fukumori Y, Yamanaka T. FEBS Lett. 288:159-162(1991). 
[041 7] 1 23. Chaperonins cpn 1 0 signature 

Chaperonins (1.2) are proteins involved in the foWing of protehs or the assembly of oligomeric protein complexes. 
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They seem to assist other polypeptides in maintaining or assuming conformations which permit their correct assemblv 

form ol gomeric complexes and are composed of two different types of subunits: a 60 Kd protein known as cmS 
(groEL >n bactena) and a 10 Kd protein, known ascpnIO (groES in bacterte).The cpnIO protein b n^tetoZX 

Islmir^r TT" "r," "^^^'"^•^ mttochondriaanS plants chloroplasUaTcpn 2 

::c~pTot^rr;s^^^^ 

consensus pattern: [LIVMFY]-x-P.IILT)-x-[DEN]-(KR]-[LIVMFA](3)-(KREQ]-x(8.9)-[SGJ-x-[LIVMFYl(3)- 

Note: this pattern is found twice in the plant chloroplast protein which consfet of the tandem repeat of a cpnIO domain 

[ 1J Ellis R.J., van der Vies S.IWI. Annu. Rev. Biochem. 60:321-347(1991) 

[ 2j Zeilsta-Ryalls J., Fayet O.. Georgopoukjs C. Annu. Rev. (Microbiol. 45:301-325(1991) 

! ^TT, ? i' ^°°9^"f^^ N.J.. Condron R.. Hoj PB. Proc. Natl. Acad. Sci. U.S.A. 89-3394-3398(1992) 

2 ? f ^c^^f °" ^"'•'^'^'^ ""''^^ f'-^- f'^"^- Acad. Sci. U.S. A. 89:8696-8^0(1 992) 
[ 5] Hunt J.F.. Weaver A.J.. Landry S.J., Gierasch L, Deisenhofer J. Nature 379:37-45(1 996). 

[0418] 1 24. Chaperonins cpn60 signature (cpn60_TCP1 ) 

Chaperonins [1.2] are proteins involved in the tokJing of proteins or the assembly of oligomeric protein comolexes 
I^rml'-'f T '° '° P°'yPeP«des fo maintain or assume conformations which S tSrco^ec, 

assembJ,.ntool.gomencstructures.Theyare found inabundanceinprokaryoles.chloroplastsandmitor^^^^^^ 
cT^Tn r °* ^'"^ 'VP- °* «"bun?s: a 60 Kd p^t^n as 

ATpTsi'^cttL" H '^ T? ' '° '^P"^^ in bacteria).The cpn60 protein showsTeak 

ATPase act«/rty and is a highly conserved protein of about 550 to 580 amino acid residues which has been described 
by different names in different species: - Escheriche coli groEL protein, which is essential for the gro^h rhfbacteri^ 
and the assembly of several bacteriophages. - Cyanobacterial groEL analogues. - Mycobacteriur^ubercu^Ss and 

iTrhf h'??/ '^^^ P^°'«'" ^ ^^PB). Rickettsia tsJtsugai^urriatr anS^^^^^^^^ 
and Chlamydial 57 Kd hypersens antigen (gene hypB). - Chloropbst RuBisCO subunit binding-proL aX 

betachains. which bind ribulosebisphosphatecarboxylasesmallandtergesubunitsandareimplJ^I in the^^^^^^^^^ 
1!^H 77 'T' ■ rnrtochondrial matrbc protein PI (mitonin or P60). - >?east HSpS) proterL 

TeSrl" " '^^^ P^°'«'"^- « well-conserved regfon weK/e 

residues, located in the last third of the cpn60sequence was chosen 

Consensus pattern: A-(AS]-x-[DEQ]-E-x(4)-G-G-[GA]- 

[ 1] Ellis R.J., van der Vies S.M. Annu. Rev. Biochem. 60:321-347(1991) 

[ 2] Zeilsta-Ryalls J.. Fayet O., Georgopoulos C. Annu. Rev. Microbrol. 45:301-325(1991). 

[0419] Chaperonins TCP-1 signatures (cpn60_TCP1) 

t?Ji Jhrt'lof 1"'^ ■ ^^''!^'' Polypeptide 1 ) was first identified in mice where it is especially abundant in 

testis bu present in all cell types. It has since been found and characterized in many other mammalian species n 
Drosophila and in yeast. TCP-I ^ a high^ conserved protein of about 60 Kd (556 to 560 resid~irfStiSes 
chL«t; T^o^' ^^'-P^^ Particle [3] wim 6 to 8 other different subunrts. ihVse su^u^ s^he 

TS^Ts.'^TrTr T'' «P«ito". ^eta and eta are evolutionary related to TCP 

0I20 ' ^^e^r?51n T""" ?• f P^^b-^'y ^^"'^ °ther proteins 

[0420] The CCT subunits are highly related to archebacterial counterparts: - TF55 and TF56 151 a molecular chao- 
erone from Sulfo obus shibatae. TF55 has ATPase actMty, is known to bind unfolded polypeptides and formTa It 
me^^c complex of two stacked nine-membered rings. - Thermosome [7]. from Themioplasma aciSur?he2r- 
fZt^n °' r ^"""""^ (^'P^^ "eta) and also seems to be a chaperone wrth ATPase act^r t 

thllon^J^fn ^ ^ chaperonin family (see <PDOC00268>).As signature patterns of this family of chaperonins 
three conserved regions located in the N-temiinal domain were chosen af^efonms. 
Consensus pattern: [REEL]-(ST]-x-[LMFY]-G-P-x-[GSAJ-x-x-K-[LIVMF](2)- 

Consensus pattern: [LIVM]-[TSJ-[NK]-D-[GA]-[AVNHK]-(TAV]-[UVM](2)-x(2)-[LIVM]-x-(LIVM]-x-^ 
Consensus pattern: a[DEK)-x-x-[LIVMGTA]-[GAJ-D-G-T- ' 

[ 1] Ellis J. Nature 358:191-192(1992). 

[ 2] Nelson R.J., Craig E.A. Curr. Biol. 2:487-489(1992). 

( 3] Lewis VA., Hynes G.M.. Zheng D.. Saibil H.. Willison K.R. Nature 358:249-252(1992) 
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[4] Kubota H.. Hynes G., Came A.. Ashworth A.. Willison K.R. Curr Biol. 4:89-99(1994) 

[ 5] Kim S.. Willison K,R.. Norwich A.L. Trends Biochem. Sci. 20:543-548(1994) 

[ 6] Trent J.D.. Nimmesgem E., Wall J.S.. HartI RU., Horwich A,L Nature 354:490-493(1991) 

[ 7] WaldmannT. Lupas A., Kellemnann J.. Peters J., Baumeister W Biol. Chem. Hoppe-Seyler 376- 119-126(1995^ 

[ 8] Hemmingsen S.M. Nature 357:650^50(1992). i^o(iyy^>}. 

[0421] 125. cyclin(Cyclrns) 

The cyclins include an internal duplication, which is related to that found in TFIIB and the RB protein. 
Medline: 94203808 

Evidence for a protein domain superfamily shared by the cyclins 
TFIIB and RB/pl 07. 
Gibson TJ, Thompson JD. Blocker A, Kouzarides T; 

Nucleic Acids Res 1 994; 22: 946-952 

[2] 

Medline: 96164440 
The crystal structure of cyclin A 
Brown NR. Noble MEM, Endicott JA, Garman EF, Wakatsuki S, 
Mitchell E, Rasmussen B, Hunt T, Johnson LN; 
Structure. 1995;3:1235-1247. 
Complex of cyclin and cyclin dependant kinase 
[3] 

Medline: 96313126 

Structural basis of cyclinKiependant kinase activation by phosphorylation. 
Russo AA, Jeffrey PD. Pavletich NP; 
Nat Struct Biol. 1996;3:696-700. 
Cyclins regulate cyclin dependant kinases (CDKs). 

The most divergent prosite members have been included, Swiss:P22674 the Uracil-DNA gfycosylase 2 is the highest 
noise and may be related but has not been included. y ci5>« ^ i5> ine nignesi 

Number of members: 189 

[0422] Cyclins [1,2.3] are eukaryotic proteins which play an active role in controlling nuclear cell division cycles 
Cychns, together wrth the p34 (cdc2) or cdk2 kinases, form the Maturation Promoting Factor (MPF) Tere are^^ 
mam groups of cyclins: /• ^.^ aio iwu 

- G2/M cyclins, essential for the control of the cell cycle at the G2/M (mitosis) transition. G2/M cyclins accumulate 
steadily dunng G2 and are abruptly destroyed as cells exit from mitosis (at the end of the M-phase) 

- G1/S cyclins. essential for the control of the cell cycle at the G1/S (start) transition. 

S?Sii„'r Tf h'^'^h "'"'"'"^ °' °2 '^y'^"""- ^^^P'«' vertebrates, there are two 

G2 cyclins. A and B, and at least three G1 cyclins. C, D, and E. 

[0424] A cyclin homolog has also been found in herpesvirus saimiri [4] 

[ 1] Nurse P Nature 344:503-508(1990). 

[ 2] Norbury C, Nurse P Curr. Biol. 1:23-24(1991). 

[ 3] Lew D.J., Reed S.I. Trends Cell Biol. 2:77-81(1992). 

[ 4] Nicholas J., Cameron K.R., Honess R.W Nature 355:362-365(1992). 

[0426] 1 26. Cystatin domain 

This is a veiy diverse family. Attempts to define separate subfamilies have failed. Typically, either the N-terminal or C- 

tTtS fimi,. h t? "^'"'Tr' '""^ '^^^^^ ^^'^ ^^^^ Cathelicidins are related 

to this family but have not been included. Number of members: 147 

[042p Inhibitors of cysteine proteases [1 .2.3]. which are found in the tissues and body fluids of animals, in the larva 
Of the worm Onchocerca volvulus [4J. as well as in plants, can be grouped into three distinct but rented families: 
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- Type 1 cystatins (or stefins). molecules of about 1 00 amino acid residues with neither disulfide bonds nor carbo- 
nyQrsT© Qroups. 

" C-tet^ZT'''^' """'^"'^^ °' ^ ^ ^ "^'"^ "^"^'^ °' ^° ^'^"'"'^^ '°°PS their 

- Kininogens. which are multifunctional plasma glycoproteins. 

SSS^ r ^!?^"'^°' °' '''^ ^''"^^ P^P*'"*^ bradykinin and play a role in blood coagulation by helping to 

S^LhTntr T Tk^"°"' ^"'"^ ^ <l°^in (ot variable length 

fTS hilninn'tnT ^"""'^ ^ '^9'°" °' ^'^^ ^^"''^"^^ ^"^^^ t^^^ Pf°P°sed to be important 

for the bind.ng to the cysteine proteases. The consensus pattern starts one residue before this conserved reglS. 

" SkRHS?!]' ''^"""^ [GSTEQKRVJO- IU\^4VAFHSAGQ]-G-x-[LIVMNK]-x(2HLIVMFY]-x-(LIVMFYA]-[DEN- 

[1] Barrett A.J. Trends Biochem. Sci. 12:193-196(1987). 
12] Rawlings N.D.. Barrett A.J. J. Mol. EvoL 30:60-71(1990) 
[3] Turk v.. Bode W. FEBS Lett. 285:213-219(1991). 

[4] Lustigman S., Brotman B., Huima T, Prince A.M. Mol. Biochem. Parasitol. 45:65-76(1991). 

[0430] 1 27. cytochrome_c (Cytochrome c) 

The Ram entry does not include all prosite members. 

The cytochrome 556 and cytochrome c' families are not included. 
Number of members: 259 

[0431] In proteins belonging to cytochrome c family [1], the heme group is covalently attached by thioether bonds to 
two conserved cysteine residues. The consensus sequence for this site is Cys-X-X-Cys-His and tJe hSd^ne^^^^^^^^ 
chZr tT? '^"^ ■''■^ arrangement is shar Jby all proteins known to belong tTc^o 

c^^hLTf ' ' -^'"Ciescytochromesc. C. c1 toc6. c550toc556. cc3/Hmc. cytochrome fand^eacS 

- Consensus pattern: C-{CPWHF)-{CPWR)-C-H-{CFYW} 

[0432] [ 1] Mathews F.S. Prog. Biophys. Mol. Biol. 45:1-56(1985). 

[0433] 128. (DAGKa) Diacylglycerol kinase accessory domain (presumed) 

[0434] Diacylglycerol (DAG) is a second messenger that acts as a protein kinase C activator This domain is assumed 
to be an accessory domain: its function is unknown. assumed 
[043S] [1] Sakane F, Yamada K, Kanoh H, Yokoyama C, Tanabe T Nature 1990 344-345-348 121 Sak«nn p im=i q 
Kai M. Wada I, Kanoh H. J Bio. Chem 1996;271:8394-8401. [3] Schaap D. de ZTvaT,e^S\^,:^J,7Je 

K^nTH M V "''h- °' ^'""^^ ^"^''''''^ -^L. FEBS Lett 1990;^^ 151- S W 

Kanoh H, Yamada K. Sakane F. Trends Biochem Sci 1 990; 1 5:47-50. o.,o, 

[0436] 1 29. (DAGKc) Diacylglycerol kinase catalytk: domain (presumed) 

[0437] Dfecylglycerol (DAG) is a second messenger that acts as a protein kinase C activator. The catalytic domain 
IS assumed from the finding of bacterial homologues. ■ ne caiaiyiic aomain 

[043^ [1] Sakane F, Yamada K. Kanoh H, Yokoyama C, Tanabe T Nature 1990;344-345-348 121 Sakane F Imai s 
J D^mmeTG' ' 996:271:8394-8401. [3] Schaap D, de Widt J. .J^T^l^^^^Jo^^, 

tn^S v i' 'J^"^^' ^'"'"'^"'''^ ^^^^ "L. FEBS Let. 1990;275:151-158 m 

Kanoh H, Yamada K. Sakane F. Trends Biochem Sci 1 990; 1 5:47-50. 

[0439] 1 30. D-amino ackJ oxWases signature(DAO) 

S «n°Hr'"°o'"' (DAMOX or DAO) is an FAD flavoenzyme that cata^zes the oxidation of 

neutral and basic D-am.no acids into their corresponding keto acids. DAOs have been characterized and seqSe^^ 

ioS'^'^rn'''""^''' "^''^ "^'^ """^ ^° "^^^^'^ peroxisomes. D-aspartate oxiSssIeTS 
i?.'!^^^,^ n' ^- '^^'^"^ *° same reacL but is active onlylS 

d,cartoxvJicD-am.noac,ds..nDAO.aconservedhistidine has been shown[2]tobeimportantfortheenz^^^^^^ 

r" "^^^ ^^''^"^ as a signature pattern for thes^Tnimls 

[0441] Consensuspattern:[LIVM](2)-H-INHA]-Y-G-x-[GSA](2)-x-G-x(5)-G-x-A[HisaprobableactlvesiteSdueJo^ 
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1 1] Negri A.. Ceciliani F.. Tedeschi G., Simonic T. Ronchi S. J. Biol. Chem. 267:11865-11871 f 1992) 

( 2] Mvano M.. Fukui K.. Watanabe F.. Takahashi S.. Tada M.. Kanashiro M.. Miyake Y. J. Biochem. 109:171-177 

[0442] 131. DEAD and DEAH box families ATP-dependent helicases signatures 

tTi^LVlTZ°^Z^ ^TT'tZT''' Characterized [1 .2.3] on the basis of their structural simi- 

hS temr«™ r» . ? '^.e '^T^P-^^P^^dent, nuclek:-acid unwinding. Proteins currently known to belong to 
lo^Sm- 'J ^'^-^"^ '^^""^ P^^^^i" i« a subunft of a high molecular wefgM 

^^11^6^92^" IZ rZ"?"' 'T' °' "'^^'^ *° " ^ ATP-dependent RNA-helicase 
- PI10 a mouse protein expressed specifically during spermatogenesis. - An3, a Xenopus putative RNA helicase 

StSto pllt ' cZrhrdr ^7'"' "'^ ''''' "-^'^'"^ '''^^'^^ '"-^^-^ pre-r;.;Nrs?iicrnrand 

SpS a vl.; nrotr, ? H '1^^^ 3'^"' . - MSS1 1 6. a yeast protein required for mitochondrial spHcing. 

A?Pa^e ntlA h I f J" °* ""^^ ■ P^- ^ antigen, pbs has 

putatrve RNA hel.case related to p68. - DBP2. a yeast protein related to p68. - DHH1 . a yeast proteki - DRSI Tyeasl 

so^Si'r^,!! ? K " ' ^ ■ ^ °'°^°P^"^ P^°'°'" ''"P^rtant for oocyte formation and 

dh^r^lr H^'^s ■ ^ °^°^P^"^ matemany expressed proteinS unknown func- 

tion. - dbpA. an Escherichia col. putative RNA helicase. - deaD. an Escherichia coli putative RNA helicase wh^h in 
suppress a mutation iri the rpsB gene for ribosomal protein S2. - rhIB. an Escherichia coll putete rSa h^^sf - 

SS2 anTzSs'f T r"^' ■ «'«9a"« ^^VPOt^et^^al proteins T26G10 1 

ZK512.2 and ZK686.2. - Yeast hypothetical protein YHROSSc. - Yeast hypothetical protein YHR169w - Fission veas 
hypothetical protein SpAC31 A2.07c. - Bacillus subtilis hypothetical protein yxiN. All these proteinfshare a number o 

orr/.,r"T °' ''^"^ '° ^'^'^ ^'^^^^ are shared by o3ier ATP-Snlg 

proteins or by proteins belonging to the helicases ■superfamiiy' [4.E1]. One of these motifs, called the 'D-E-A-dSx" 

ep^sents a special version of the B motif of ATP-binding proteins. Some other proteins belong to a subfamily v^ch 
have His instead of the second Asp and are thus said to be 'D-E-A-H-box' proteins (3.5.6.E1] Proteins currenTwn 
to be ong to this subfamily are: - PRP2. PRP16. PRP22 and PRP43. These yeas! prote^iTs are I'nvo vS" n vaZ 

Mai rS^^ " P;e-mRNA splicing process. - Fission yeast prhl, which my be invoK^ed in pre-mRNA sS 

RAD3 f or^ p^nf hk^I?? '"''"'"^ "^^^ compensation of X chromosome linked genes 
RAD3 from yeast. RAD3 is a DNA hel«:ase involved in excision repair of DNA damaged by UV light bulky adducts or 
cross-linking agents. Fission yeast radi 5 (rhp3) and mammalian DNA excision rep^r proiein XPD ERC^J 2) ar^^^^^^ 
homologs of RAD3 - Yeast CHL1 (or CTF1 ), which is important for chromosome transmission and normal ceH cyde 
progression ,n G(2)/M. - Yeast TPS1. - Yeast hypothetical protein YKL078w - Caenorhabdrtis eCanXolticil 

tCSet^nscin.'^^^^^^^ 

?u ^2 RMA K r r ^^"^ ^^"^ - ^ P'^*'^^ ^^^'"'^ helicase. - hrpA. an Escherichia coJ 

putative RNA helicase. Signature patterns for both subfamilies were developed scnericna coii 

[0443] Consensus pattern: [LIVMF](2)-D-E-A-D-(RKEN]-x-(UVMFYGSTN 
Consensus pattern: [GSAH]-x-[LIVMF](3)-D-E-(ALIV]-H-[NECR] 

enT,^' <PD^^T "^ '° """" ^ °' "'"^'"9 (P-'°°P) the relevant 

[ 1] Schmid S R., Under P. Mol. Microbiol. 6:283-292(1992) 

[ 2] Lhder P. Usko R, Ashburner M., Leroy R. Nielsen P.J.. Nishi K.. Schnier J., Sto^^^^ 

[ 3] Wassarman D.A.. Steitz J.A. Nature 349:463-464(1991). 

[ 4] Hodgman TC. Nature 333:22-23(1988) and Nature 333:578-578(1988) (Errata) 

[ 5] Harosh I.. Deschavanne P Nucleic Acids Res. 19:6331-6331(1991). 

[ 6] Koonin E.V.. Senkevich TG. J. Gen. Virol. 73:989-993(1992). 

[0444] 132. (DHBP.synthase) 3.4-dihydroxy-2-butanone 4-phosphate synthase 

nrt°'^^'*?T.^'''"f"°"' 4-phosphate is btosynthesized from ribulose 5-phosphate and sen/es as the bfo- 

K ?S fDHDPS^SLd^^^^^^^^ ''"t"' ""^'^^^^ 1997; 280:374!382.' 

liw^j 1 33. (DHDPS) Dihydrodipicolinate synthetase signatures 

Dihydrodipicolinate synthetase (EC 4Ji52) (DHDPS) (1 ] catalyzes, in higher plants chloroplast and in many bacteria 
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L r^^^ni; f '^T" ^^^'^'^ '° °' *Vsine and of diaminopimelate. DHDPS is responsible for 

^ ^^ITT ^^Pt^^/^-^'^'^^^^^yde and pyruvate by aping^x^ng mechanism in which pyruvate fi^t bini to 
f ^ r'"^ a Schiff-base with a lysine residue. Three other proteins are structurally reLed to DHDPsInd 
probably also act vb a similar catalytic mechanism: - Escherichia coli N^cetylneuraminate lyase (EC 4 1 3 3) (qene 
nanA) Which catalyzes the condensation of N-acetyl-D-mannosamine and pyruvate to fo JN-acetylnliS? nate 

ZTrT ^ JyP"'^"*"^^' P^°»«'" yjhH- Two signature patterns for these enzymes were develop J The first one 

H T J'^? ^ °' P'°»«'"«- The second signature contaSis aCe 

es^e Which has been shown, in Escherichia coli dapA [2], to be the one that fom,s a SchHt'base wS he substrate 
[0448] Consensus pattern: [GSA]-[LIVM]-[LIVMFY]-x(2)-G-[ST]-[TG].G.E-rGASNF]-xf6)-rEQl- ^UDstrate. 
rST~si x,SGAHUV.P)-K,0EOAF,- 

[ 1] KaneltoT, Hashimoto T. Kumpaisal R., Yanrada Y. J. Biol. Chem. 265-17451-17455(1990) 

[ 2] Laber B.. Gomis-Rueth F.-X.. Romao M.J.. Huber R. Biochem. J. 288 691-695(1992) 

[ 3] IVIurphy P.J.. Trenz S.P.. Grzemski W.. de Bruijn FJ.. Schell J. J. Bactertol. 175:5193-5204 (1993). 

[0449] 1 34. (DHOdehase) Dihydroorotate dehydrogenase signatures 

Dihydroorotate dehydrogenase (EC 1.331) (DHOdehase) catalyzes the fourth step in the de novo biosynthesis of 
pynmidine^ the converston of dihydroorotate into orotate. DHOdehase is a ubiqurtous FAD flavoprLTrSri 
(gene pyrD). DHOdease is located on the inner side of the cytosolic membrane In some yeasts, such as n S^hS 

Ll Tf , !f' " ^"^ Pa««^"« "^^'^ developed specific to this en- 

zyme. The first corresponds to a region in the N-terminal section of the enzyme while the second is located in the C- 
terminal section and seems to be part of the FAD-binding domain 

r^T/«r^"°P^"°'"f'^^^"''<'*'"'®'^-f®®TA]-(LIVFSTA]-(GT]-x(3)-[NQR]-x-G-[NHY]-x(2)-P-[RTl 
[0450] Consensus pattem[LIVM](2)-[GSA]-x-G-G-[IV]-x-[STGDN]-x(3)-lACV]-x(6)-G-A 
[0451] ( 1] Nagy M.. Lacroute F. Thomas D. Proc. Natl. Acad. Sci. U.S.A. 89-8966-8970(1 992) 
[0452] 135. (DMRL_synthase)6.7-dimethyl-8-ribityllumazine synthase 
[0453] 1 36. (DNA.methylase) C-5 cylosine-specific DNA methylases signatures 

of cytosines in DNA [1.2.3]. Such enzymes are found in the proteins described below - As a component of type N 
restriction-modification systems in prokaryotes and some bacteriophages. Such enzymes recognize a slVfic SNA 
sequence where they methylate a cytosine. In doing so. they protect DNA from cleavage by type S rertnc«relmet 
that recognize the same sequence. The sequences ofalargenumberoftypellC-5 leases a7eknown-^h^^^^^ 
^Zn^Cs::^::^ "^'^ "^''^'^'^ ^^^ol^V,,es. T,. sequence of the mammalian enz^mt s 

LoeXtT« fo^t ?f f w'''^ "y"'"'"' "^^^ t4] to be invoK^ed in the catalytic mechanism ft 

:~;":e'ir^^^^^^^ 

[0454] Consensus pattem: [DENKS]-x-[FLIV]-x(2)-(GSTC]-x-P-C-x(2)-[FYWLIM]-S [C is the active site residuel- 
Consensus pattem: [RKQGTF]-x(2)-G-N-[STAG].[LIVMF]-x(3)-[LIVMT]-x(3)-[LIVM].x(3)-[LIVM]- ^ 

f 1] Postal J., Bhagwat A.S., IRoberts R.J. Gene 74:261-263(1988) 

(fgS!™'^ ' ^^^"^ ^ ' '^"'"^^^"^'^^ ®- ^ • f'°^'ai J.. Roberts R.J.. Wilson G.G. Nucleic Acids Res. 22:1-10 
[ 3] Lauster R., Trautner TA., Noyer-Weidner M. J. Mol. Biol. 206:305-312(1989) 

[4] Chen L. McMillan A.M.. Chang W.. Ezak-Nipl^y K.. Lane W.S.. Verdine G.L Biochemist^, 30:11018-11025 
[0455] 1 37. (DNAphotolyase) DNA photolyases class 2 signatures 

ao^^ST'''^.""'''^'"" (EC 4i99J) (DNA photo^ase) [1.2] is a DNArepair enzyme. It binds to UV-dam- 

tor i s oS^'^^J c/nui*'" " ^" ^"^"'^ "^^^ ^^''"'^^^ choromophore^ofactor^ 

5-i2za2 iDFTdl ? ^Stfi!^ 5.10-methenyltetrahydrofolate (S.IO-MTFH) or an oxidized 8-hydroxy- 
«e^ ADH2 Irlr^^^^^^^^^^^ 11 T^t °' chromophore appears to function as an antenna. 
r^nwA nh^t H to be responsible for electron transfer. On the basis of sequence similarities 

[3] DNA photolyases can be grouped b,to two Classes. The secondclasscontainsenzymesfromMyiococcusS^^ 
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I rT^- ^^'^ ^ °' consenred sequence regions in all known class 2 DNAphotolyases. espe- 

cially in the C-teiminal part. Two of these regions were selected as signature patterns 
Consensus pattern: F-x-E-E-x-[LIVM](2)-R-R-E-L-x(2)-N-F- 
Consensus pattern: G-x-H-D-x(2)-W-x-E-R-x-[LIVM]-F-G-K-[LIVM]-R-[FY]-M-N- 

[ 1] Sancar G.B., Sancar A. Trends Biochem. Sci. 12:259-261(1987) 
[ 2] Joms M.S. Biofactors 2:207-211(1990). 

\^994^' ^" ^"^^ ^^^"^'"^ "^^'""^ "^^"^^ ^'^'"^ ^"^BO J. 13:6143-6151 

[0456] (DNAphotolyase2) DNA photolyases class 1 signatures 

Phototyase (EC 4JJi3) (DNA photolyase) [1 .2] Is a DNA repair enzyme. It binds to UV-dam- 
aged DNA containing pynmidine dimers and .upon absorbing a near-UV photon (300 to 500 nm), breaks the cyclobu- 
tane nng pining the two pyrimidines of the dimer DNA photolyase is an enzyme that requires two chor3hore- 
cofactors for its activity: a reduced FADH2 and either 5.1 0-methenyltelrahydrofolate (5.10-MTFH) or an oxidized 8 hy- 
droxy-5-deazafIav,n (8-HDF) derivatKre (F420). The folate or deazaflavin chromophore appears to fu^cZas an an- 
tenna, whe the FADH2 chromophore is thought to be responsible for electron transfeT On the basis of s^uei^e 
similar^es[3] DNA photolyasescan be grouped into twoclasses. T^^ 

and Gram-posmve bacteria, the halophilic archaebacteria Halobacterium halobium. fungi and plants Class 1 ZSZl 

bind erther S.IO-MTHF (E.coli. fungi, etc.) or 8-HDF (S.griseus. H.halobium).This famiV aL Includes A^i 

17£TT" ^"""^'^T ' (CRY2).which are blue light photoreceptors that mediate blue light-induced gen^ 

c TprZ^l^r T ^ ?r °' '^"'"'^ '"^bns in all known class 1 DNA photolyases. especially in the 

C-terminal part. Two of these regions were selected as signature patterns 

[0457] Consensus pattem: T-G-x-P-[LIVM](2)-D-A-x-M-[RA]-x-[LIVM]- 

Consensus pattem: (DN]-R-x-R-[LIVM](2)-x-[STA](2)-F-(LIVMFA]-x-K-x-L-x(2.3)- W-[KRQ]- 

[ 1] Sancar G.B., Sancar A. Trends Biochem. Scl. 12:259-261(1987). 
[ 2] Joms M.S. Biofactors 2:207-21 1 (1990). 

(1994^' ^ ' ^ ' ^^^"^"^ ^^'^"^ ^ " ^ - ^"^BO J. 13:6143-6151 

( 4] Lin C. Ahmad M.. Cashmore A.R. Plant J. 10:893-902(1996). 

[0458] 138. (DNA_poLA) 

DNA polymerase family A signature 

Replicative DNA polymerases (EC 2.7.7.7) are the key enzymes catalyzing the accurate replication of DNA They 
require erther a small RNA molecule or a protein as a primer for the de novo synthesis of a DNA chain. On the basis 

DNATZrrjt"'''^ ! T'^'f polymerases have been grouped together [1 ,2,3] under the designation of 
DNA polymerase family A. The polymerases that belong to this family are listed below a > o" 

- Escherichia coll and various other bacterial polymerase I (gene polA). 
Thennus aquaticus Taq polymerase. 

Bacterfophage spOl polymerase. 
Bacterfophage sp02 polymerase. 
Bactertophage T5 polymerase. 
Bacteriophage T7 polymerase. 
Mycobacteriophage L5 polymerase. 

- Yeast mitochondrial polymerase gamma (gene MIPl ). 

•S. I'^^T." °' "i""^'"^ '"""'^ ^" polymerases. One of these conserved regions, known as 

substrates, rt contains a conserved tyrosine which has been shown, by photo- affinity labelling, to be in the active site 
was used as a signature for this family of DNA polymerases 

Consensus pattemR-x(2)-[GSAV]-K-x(3)-[LIVMFY]-(AGQ]-x(2).Y-x(2)-IGS]-x(3)-ILIVMA] Sequences known 
to beteng to this class detected by the pattern ALL h wiown 

[ 1] Delarue M.. Poch O.. Todro N.. H^oras D.. Argos R Protein Eng. 3:461-467(1990). 
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[ 2] Ito J., Braithwarte D.K. Nucleic Acids Res. 19:4045-4057(1991). 
[ 3] Bralthwaite D.K.. Ito J. Nucleic Acids Res. 21:787-802(1993). 

[0461] 139. DNA_poLviraLC 

DNA polymerase (viral) C-terminal domain 

Number of members: 128 

[0462] 140. (DNAjopoisoll) 

DNA topoisomerase II signature 

of topolog cal DNA .somers. Type II topoisomerases are ATP^ependent and act by passing a DNA segment through 
a t^ns.ent double-strand break. Topoisomerase II Is found in phages. archaebacterL, proLryotes, eurr^otes S2 
oeS iTnd.mT ""I ^""V- ."-^^-P^^S^ topoisomerase II consists of t'hree sTbunils (the^prcSic"' 
genes 39. 52and60). In prokaryotes and in archaebacteria the enzyme, known as DNAgyrase. consists of twosubunlts 
genes gyrA and gyrB IE21). In some bacteria, a second type II topoisomerase has been identified ft s knoras 

In eukatyotes. type II topoisomerase is a homodimer anopdrc;. 

J^f^ ""^"^ °* sequence homology between the different subtypes of topoisomerase II The 

relation between the different subunits is shown in the foltowing representation: opoisomerase II. The 



< About-1400-residues — 



[ Protein 39-* ][-.-Protein 52--] Phage T4 

[ .gyrB • ][ gyrA j Prokaryotell 

Archaebacteria 

[ parE * ][ parD j ProkaryotelV 

[ * J Eukaryoieand 

ASF 

Position of the pattern. 

i!^L " '^"y °* ^ ^^9'°" « <=°nserved pentapeptide was 

f^ci, P^"^'" Sy^^' P^'^- ^"^ P^°'«i" 39 °f phage T4 topoisomerase 

tJ^ttem all'"' P""«"^f-'V'^^l-''-^-°-fDN]-S-A.x-[STAGl Sequences known to belong to this class detected by 

[ 1] Sternglanz R. Curr Opin. Cell Biol. 1:533-535(1990). 

[2] Bjornsti M.-A. Curr. Opin. Stmct. Biol. 1:99-103(1991). 

( 3] Shamria A., Mondragon A. Curr Opin. Struct. Biol. 5:39-47(1995). 

[ 4] Ftoca J. Trends Biochem. Sci. 20: 1 56-1 60(1 995). 

[0466] 1 41 (DSPc) Tyrosine specific protein phosphatases signature and profiles 

IXL^h^f !n ^^2=48) (PTPase) f1 to 5] are enzymes that catalyze the removal of a 

l^^Tl^r'^^T ^ T'"^ '^^^ '"^"'^^ ''"P^'^^"^ of growth, pro- 

inrrT^ °? ^^'[^^'"'"^''o"- "^""'Pte fon^s of PTPase have been characterized and can be classL 

nto two categories: soluble PTPases and transmembrane receptor proteins that contain PTPase domain(s) The cut 

PTJ^3^m)and P^r^^^^^^^ "''^''^ " " ^^^^^ (T-cell PTPase TC-P?P) - 

iH ^"^.P^Pf'* (MEG), enzymes that contain an N4erminal band 4. 1 - like domain (see <PDOC00566>) and 
SNmPTpTr?^^^^^ and cytoskeleton. - PTPN5 (STEP). - PTPN6 (PTP-1C; HOP; SHP and 

ThTn iJ^ir SH-PTP3, Syp). enzymes which contain two copies of the SH2 domain at Hs N4emiinal extremity 

TheDroscvhilaprotein corkscrew(genecsw)also belongs to this subgroup.-PTPN7(LC-PTP;HematopoieticZ^^ 
tyrosine phosphatase; HePTP). - PTPN8 (70Z-PEP). - PTPN9 (MEG2). - fTPN12 (PTP-G1 • PTP-JS) SpTm 
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JSl^r . •*^"*,"^y ''f '"volved in the ubiquitin-mediated protein degradation pathway. - Fission yeast pypi and 
yopH)^-Autogjaphacalrforn.canuclearpolyhedrosisvirus19KdP^^^^^ 

MAP krnase phosphatase-1 ; MKP-1); which dephosphorylates MAP kinase on both^r-183 and Tyr-ias DUSP2 
m^fno nLIf dephosphorylates MAP kinases ERK1 and ERK2 on both Thr and Tyr residues 

DUSP3 (VHR). - DUSP4 (HVH2). - DUSP5 (HVH3). - DUSP6 (Pysti ; MKP-3). - DUSP7 (Pyst2- MKP )Q iJs MSGS 
a PTPase that dephosphorylates MAP kinase FUS3. - Yeast YVH1. - V^iinia virus mPTPase ^dul^^^^^^ 
Phosphatase Receptor PTPases. Structurally, all known receptor PTPases, are made up of a ilble lenXe^^^^ 
ceSrPTr'"' " t«"3'"emb^e region and a C^erminalcatalytic cytoplasmic do,^ rime?, tht i 

aErrrrH ^^P^^'^- '""""noglobuim-like domains. MAM doma^nTolrt^I 

pXom^l thT'T ^^9*°" 9^"««"y <=°"t«i"« two copies oHhePT- 

PAse doma n The first seems to have enzymatic activity, while the second is inactive but seems to affect substrate 
specificrty of the first. In these domains, the cata^tic cysteine is generally consented but sZ o^nresuS 

2 0 0 p « T ! ' 7 ; '9 '"^-^ ^^^^ PTPaseLeukocyte common antigen (LCA)Xd2 o 

n pm n n o^o^o™?^'" '^^^'^ ^ ^ ° ° ^ Drosophila DLAR 3 9 0 0 2Drosophila DPTP 2 2 0 0 2PTP^toha 
(LRP) 0 0 0 0 2PTP-beta 016 0 0 IPTP-gamma 0 110 2PTP-delta 0 >7 0 0 2 PTP-epsilon 0 0 0 0 2PTP kln^a 1 ! 

vltre:Te''°^''''Kr^^°''""^^^ 

cysteines the second one has been shown to be absolutely required for activity. Furthemn^re. a number of co3S 
residues ,n s immediate vicinity have also been shown to be important. A signature pattern fofpTPase dizains wS 
derived centered on the actK^e site cysteine. There are three profiles for PTPases. the first one spZs T^r^Zl 
tor^PTTs'JbST'" ^'""''^ *° dual.pecific«y PTPaiTd the 
[0467] Consensus pattern: [LIVMF]-H-C-x(2)-G-x(3^[STCI-[STAGPJ-x-[LIVMFY] [C is the active site residue]- 

[ 1] Fischer E.H., Charbonneau H., Tonks N.K. Science 253:401-406(1991) 
[ 2] Charbonneau H., Tonks N.K. Annu. Rev. Cell Biol. 8:463-493(1 992) 
[ 3] Trowbridge I.S. J. Biol. Chem. 266:2351 7-23520(1 991 ). 
[ 4] Tonks N.K., Charbonneau H. Trends Biochem. Sci. 14:497-500(1989) 
( 5] Hunter T Cell 58:1013-1016(1989^ 

[0468] 142. (DUF10) Uncharacterized protein family UPF0076 signature 

^l;^l°„"r"^, ""'^^^f/^terized proteins have been shown [1] to share regions of similarities: - Goat antigen UK114 a 
human homolog and the rat corresponding protein which is known as perchloric acid soluble protein (PSR) PSPi 121 
may inhib an inrt^tion stage of cell-free protein synthesis. - Mouse heat-responsive protein HRSP1 2 Yeast cl 2 
S^^'f''" ■ <=hromosome IX hypothetical protein YILOSIc CaLrabditl^- 

tri TT f^T'' ■ E^'^^'^richia coli hypothetical protein ycdK. - Escherichia coH hSS prt 

te^ yhaR. - Eschencha coli hypothetteal protein yjgF and HI071 9. the corresponding HaemophilustflJSzae ?ote n 
-Escherichia col, hypothetical protein yoaB. - Bacillus subtilis hypothetical protein yabJ - Hae^hTs tnflSLnzT^ 

h TV ^"""''' 6803 hypothetical protein slr0709. - Rhizobium strain NGR23nS^Sotic 

a ou^r^'^H K ■ ''^"^"""^ P^°tein PH0854.These are sm^rpSs S 

ZZl Lr^^u^ T'"'" " ''^"^ "^^^-ved-As a signature pattern, a well conserved region kx^ateS ^Te C 
terminal part of these proteins was selected. '^aicu in ine o 

[0469] Consensus pattern: PA]-[ASTPV]-R-[SACVF]-x-[LI VMFY].x(2)-[GSAKRl-x-[LMVAW^^ 
[ 1) Bairoch A. Unpublished observaticns (1995). 

[fjgf ^' ^ ' K.. Hong Y.-M.. Suzuki I.. Muno2 S.. Natori Y. J. Biol. Chem. 27Q-5nnfin-.^r.«7 

[0470] 143. (DUF3)Domain of Unknown Function 3 

Domain apparently occurring exclusively in eubacteria. Unknown f unctbn 

[0471] 1 44. (DUF6) Integral membrane protein 

fro?ipi^?rs'r4r''''"''^^ 

[0473] 1 45. (DUF7) Integral membrane protein 

[0474] This family includes many hypothetical membrane proteins of unknown function. Swiss:P145Q2 has been 
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implicated in resistance to ethidium bromide. 

[0475] 146. (DapB) Dihydrodipicolinate reductase signature 

Zf'^'^'^n^'^LToT ^^^^^^ '^•^^^^^ •''^ «tep in the biosynthesis of diaminopimelic acid and 

lysine, the NAD or NADP-dependent reduction of 2.3<lihydrodipicolinate into 2,3.4.5-tetrahydrodipicolinate This en- 
zyme IS present in bacteria (gene dapB) and higher plants. As a signature pattern the best conserved region in this 
enzyme ^s selected. It is located in the central section and is part of the substrate-binding region [1] 
[0476] Consensus pattem: E-[IV]-x-E-x-H-x(3)-K-x-D-x-P-S-G-T-A- 
[0477] [ 1] Scapin G., Blanchard J.S.. Sacchettini J.C. Biochemistry 34-3502-3512^1 995^ 
[04781 147. DedA family ^ ' 

S h^'^ 'f "'"y^c°";'bin«s the DedA related proteins and YIAfgA'GIK family. Members of this family are not func- 
tionally characterised. These proteins contain multiple predicted transmembrane regions 
[0480] 148. DegT/DnrJ/EryC1/StrS family 

E J!^^.TT^'^ °' '^"1"^^ ^"^^^ Characteristics of the sensor protein of two^omponent signal trans- 

of th sZtTk T'r"' ,^ ' '""^'"^ ""^ ^'""'^"^ '° '^^^^^ P^°'«'" kinoes- The members 

of this family do have the typical helix-tum-helix motif of DNA binding proteins. 

[0482] [1] Stutzman-Engwall KJ, Often SL, Hutchinson CR, J Bacteriol 1992' 174 144-154 

[0483] 149. (Desaturase) Fatty acid desaturases signatures 

ofStf^^'f ^S."'^""^ ' ^""y"""" "^'^"y" °' « <^°"ble bond at the delta position 

p'I'nJT^'^ !? T "^'^''""^ °* '^"y ^"^'^ desaturases which do not seem to be evolutionary 

related. Family is composed of: - Stearoyl-CoA desaturase (SCD) (EC 1.14.99.5 ^ [1]. SCD is a key regulatory enzyme 
Of unsaturated fatty acid biosynthesis. SCD introduces a cis double t^i^Me 6etiai9) position of faS aoyl^A-s 
muln'r '"r'" a membrane-bound enzyme that is thJughTS functbn as a part o^a 

r! ^''^°P^^^^ vertebrates and fungi. As a signature pattem for this family a 

J H °' ^"^'"^s was selected, this region is rich in hfetldine residues and in 

aromatic residues. Family 2 is composed of: - Plants stearoyl-acyl-carrier-protein desaturase (EC 1.14 99 6 ^ [21 these 
enzymes catalyze the introduction of a double bond at the delta(9) position of steraoyl-ACP to p'l^S^^ eoyl-ACP 
IZZ"!^^ " P°"f' " °' fatty acids to unsaturated fatty acids in the synthesis of 

oSn ^y^^^^f ^^^A that can introduce a second cis double bond at the delta(12) 

tPmnri*^? n-^'^H ^"",f 1° '""'"'"^"^^ glycerolipids. DesA is involved in chilling tolerance; the phase transftion 
temperature of lipids of cellular membranes being dependent on the degree of unsaturation of fatty acids of the mem" 

s^JITted ^ '^"'"^ ^ ''9'°" P^rt °' «"^y^«8 was 

[0484] Consensus pattem: G-E-x-[FY]-H-N-(FY]-H-H-x-F-P-x-D-Y- 

Consensus pattern: [ST]-[SA]-x(3)-[QR]-(LI]-x(5,6)-D-Y-x(2)-[LIVMFYW]-[LIVM]-[DE]- 

[ 1] Kaestner K.H., Ntambi J.M.. Kelly TJ. Jr.. Lane M.D. J. Biol. Chem. 264:14755-14761(1989) 
[ 2] Shanklin J., Somerville CR. Proc. Natl. Acad. Sci. U.S.A. 88:2510-2514(1991). 
[ 3] Wada H., Gombos Z.. Murata N. Nature 347:200-203(1 990). 

[0485] 150. Dihydroorotase signatures 

i ' ft (N<arbamoyl-L-aspartate) into dihydroorotate. Dihydroorotase binds a zinc ion which is requirS 

LvrS T ' • Sl"^' ^ ^"""^ °' '^^"''^^l °* 400 amino-ackl residues (gene 

pyrC). In higher eukaryotes. DHOase ,s part of a large multi-functional protein known as 'rudimentary' in Drosophila 
and CAD in mammals and which catalyzes the first three steps of pyrimidine biosynthesis [2J. The DHOase domain is 
Sir I h Tl 'nl°' ^^T^P'^'^'^- y«««t«- DHOase is encoded by a monofunciional protein (gene URA4r 
However, a defective DHOase domain [3] is found in a multifunctional protein (gene URA2)that catalyzes the first two 
°' ""7"*'^='= ^^"^ comparison of DHOase sequences from various sources shows [4] that there 

SJ! tl^H^ ^ ^^"^r "^^ '"""'^ P^rt- Signature patterns for both regbns 

?AL1??S, » r ; (EC 3JJJ) is the enzyme that hydro^zes allantoin intoalbntoate. In yeast (gene 

SI i H !. . ^"f^""" indegradatbn pathway; in amphibians [6] and fish it catalyzes the second 
r,^o« ^ degradation of unc acid. The sequence of allantoinase is evolutionaiy related to that of DHOases 
S]- °-f'-'^'^^S^PJ-"-f'-'VAl-H-[LIVF]-[RN]-x-[PGANF] [The two H's are probable zinc 

Consensus pattem: [GA]-{ST]-D-x-A-P-H-x(4)-K- 

[ 1] Brown D.C., Collins K.D. J. Biol. Chem. 266:1597-1604(1991). 
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( 2] Davidson J.N.. Chen K.C.. Jamison R.S.. Musmanno L.A.. Kern C B BloEssavs 15-157 ^RA,■,Qo^^ 
! '•■[; '."'^ ■ ^""^ S. Gene 79;59 70 lIS) 

[0487] 1 51 . dnaJ domains signatures and profile 

^, „ . ! ^ + 1 N-terminal 1 1 

Gly-R 1 1 CXXCXGXG | C-teiminal | + +.+ +. 



-+ 

+ 



a) Proteins containing both a 'J' and a 'CRR' domain: 

' Yeasj protein MAS5/YDJ1 which seems to be involved in mrtochondrial protein import 

- Yeas protem MDJ1. involved in mitochondrial biogenesis and protein folding 

- Yeast protein SCJ1 . involved in protein sorting. 

- Yeast protein XDJ1. 

- Plants dnaJ homologs (from leek and cucumber). 

- Human HDJ2. a dnaJ homolog of unknown function. 

- Yeast hypothetical protein YNLOTTw. 

a) Proteins containing a 'J'domain without a 'CRR' domain: 

Rhizobium fredii nolC. a protein involved in cultivar-specrfic nodulation of soybean 
Escherichia coli cbpA [3], a protein that binds curved DNA 

Yeast protein CAJ1. 
Yeast hypothetical protein YFR041c. 
Yeast hypothetical protein YIR004w. 
Yeast hypothetical protein YJL162c 

Human HDJ1. 

Human HSJ1 , a neuronal protein. 
Drosophila cysteine-string protein (csp). 

[0491] Consensus pattern: [FY)-x(2)-[LIVMAJ-x(3)-[FYWHNTl-[DENQSAJ-x-L-x-fDNl-xf3VfKR1 xr?i rpvii 
consensus pattern: C- [DEGSTHKR^x-C-x-G-x-[GK]-[AGSDMW2)-IGS^KR^^^^^^ 

[1] Cyr D.M.. Langer T, Douglas M.G. Trends Biochem. Sci. 19:176-131(1994) 

S 1^"^ ^:-^t^' ^ - ^^^"^ ^"l^" ^- Trends Biochem. Sci. 17:129-129(1992) 

[3) Ueguch, C. Kaneda M.. Yamada H.. Mizuno T. Proc. Natl. Acad. Sci. U.S. Al ^054-1058(1994). 
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[0492] 152. 
[0493] 153. Dvrarfin 

b"'^; DlSiyr'^ " "'^ """'^ "^^ '^-^^--^ °' ^^D can 

S^9iJ\2r£VIl°^' 'l^r^' ^^"^ ^^''^ett RW. Wang XF. Proc Natl Acad Sci U S A 1996 93" 

7J k ? ^' ^' ^"9"^°" A- Nature 1997:388:304-308 

[0496] 154. Dynein light chain type 1 signature 

*nolnllgh,chai.,..Y„s,c>«oplai,lcdyn,hSS,,Sr2™^^^^^^^ 

Consensus pattern: H-x-i-x-G-[KR]-x-F-[GA]-S-x-V-[ST]-[HY]-E - 

[ 1] King S.M.. Patel-King R.S. J. Biol. Chem 270:11445-li4S9f iOQi;\ 

[ 2] Dick T, Ray K.. Salz H.K.. Chia W. Mol. Cell. Biol. 16:1966-1977(1996). 

[0497] 155. dUTPase 

[0498] dUTPase hydrolyzes dUTP to dUMP and pyrophosphate 

S« nff ■ '"'^"'',2! 2°^' «°Wl«yBl9 ae«rtna»es ilno-blnaing rsglon signage 

H a^ Zinc S^nS:" '""""^ ^^"HAGV,-E-x(2HUVMPGAT]-tUVM^x(17%,-P-C-x(2.8K^^^^^^^^^ C. and 

[ IJ Yang C, Carlow D.. Wolfenden R.. Short S.A. Biochemistry 31:4168-4174(1992) 

3 S i '^R^'l'Tr"^ ""^"y ° """'^y 268:2288-2291 (1993) 

! S^Xi; Q K, ' SaierM.H. Jr. Protein Sci. 3:853-856 1994 

[ 4] Bhattachan^a S.. Navaratnam N., Morrison J.R.. Scott J.. Taylow W.R. Trends Biochem. Sci. 19:105-106(1994). 

[0502] 157. Dehydrins signatures 

- Cotton LEA protein D-1 1 . 

- Craterostigma plantagineum dessication-related proteins A and B 

- Maize dehydrin M3 (RAB-17). 

Pea dehydrins DHN1, DHN2, and DHNa 
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Radish LEA protein. 

- Rice proteins RAB 16B, 16C. 16D. RAB21 , and RAB25. 

- Tomato TAS1 4. 

- Wheat dehydrin RAB 1 5 and cold-shock protein cor41 0. cs66 and csl 20. 

[0503] Dehydrins share a number of structural features. One of the most notable features is the presence. In their 

b««nlnnH"'r" ° °' «° serines followed by a cluster of charged residues. Such a region has 

. , 'r^"""' "'^"P""" °' '^«'^y^""= A second conserved feature is the 

presence of two copies of alysine-rich octapeptide; the first copy is located just after the cluster of charged residues 

boTh ^g^s w^^^^^^^ patterns for 

[0504] Consensus pattern: S(5)-[DE]-x-[DEJ-G-x(1 ,2)-G-x(0,1 )-[KR](4 
Consensus pattern: [KR]-[LIM]-K-[DE]-K-[LIM)-P-G- 

[1J Close T.J.. Kortt A.A.. Chandler P.M. Plant Mo\. Biol. 13:95-108(1989). 
[2] Robertson M., Chandler RM. Plant Mol. Biol. 19:1031-1044(1992) 

51:4774^(198^°^ ^ " "^'^'^^ "° ° " ^"""^ " ^ '''^"^ 

[0505] 1 58. (deoR) Bacterial regulatoiy proteins, deoR family signature 

JJo !:^"'*^"P"°" ^°9"'at'°" Prateins which bind DNA through a helix-tum-helix' motif can be classified 

r.?n * * ^ °' ^^''"^"^^ similarities. One of these subfamilies groups the foltowing proteins(1 2] - 

accR, the Agrobactenum tumefaciens plasmid pTiC58 repressor of opine catabolism and conjugal transfer agaR 
he Escherichia col, aga operon putative repressor. - deoR. the Escherichia coli deoxyribose operon repressor. - fucR' 
the Eschenchia coil L- ucose operon activator. - gatR. the Escherichia coli galactitol operon repressor. - gIpR the 

f rll,t n^^'" M ; '"P'""'°^- ■ 9"'" Escherichia coli gluctol operon repressor. 

-iolR fromBacillussubt.lis.-lacR,thestreptococcilactosephosphotransferasesystemrepressor-spcJllD theBacillus 
sub^hs transcnption regulator of the sigK gene. - yfjR. an Escherichia coli hypothetical protein. - yg'S an Escherichia 
cohhypothet,calprote>n.-yihW.anEscher^^ 

- yjhJ an Escherichia col. hypothetical protein. The •helix-tum-helix' DNA-binding motif of these proteins is lo<ited in 
me N-terminal part of the sequence. The pattern used to detect these proteins starts fourteen residues before the HTH 
motif and ends one residue after it. 

[0506] Consensus pattern: R-x(3)-[LIVM]-x(3)-[UVM]-x(16,17)-[STA].x(2)-T.[LIVMA]- [RH]-[KRNA].D-[LIVMF]- 

[ 1] von Bodman S.. Hayman G.T., Farrand S.K, Proc. Natl. Acad. Sci. U.S.A. 89:643-647(1992) 
[ 2] Bairoch A. Unpublished observations (1 993). 

[0507] 159. dsnm 
Double-stranded RNA binding motif 

Scien^^9S:6?522T'''"'^ """^^ 

[0508] SequencesgatheredforseedbyHMMJterativejrainingPutativemotifsharedbyproteinsthatbindtodsRNA 
At least some DSRM proteins seem to bind to specific RNA targets. Exemplified by Staufen. which is involved in 

nT k'* f f "^'^^ " '""^ '^'^ "^^^^^P^'^ inLeronnnduced p o erkiSse 

in humans, which IS part of the cellular response to dsRNA. 
[0509] Number of members: 1 1 6 
[0510] 160. Dynamin family signature 

Dynamin [1,2] is a microtubule-associated force-producing protein of 100 Kd which is involved in the production of 
^rot'l? n ^"1"^''^ " '° "'"'^ ^■^P- structurally related to the following 

Emin "i. ! ^^'"^ '''^ f^'- ^'^''"^^ ^' "^'y P^°'«'"J'' '^"^ °°9"ate of mammalia^ 

rc, J ' ^ P"'*^'" "^""^ """'^ ^ '"^o'^e'^ microtubule^ssociated motility. - Yeast protein MGM1 

[5] which IS required for mitochondrial genome maintenance. - Yeast protein DNM1. which is involved in endocytosis 

■Jriterferm induced K^proteins[6.7].lnterferonalphaorbeta induce thesynthesisofafami 

mS°* proteins are known to confer resistance to influenza viruses and/or rhabdovimses on transfected mam- 
^ ^nfi'" T ""'"^ '"""'^ ^" <3TP-binding proteins are located in the N-tem,inal part of these 
SfTeAT^^TpTnl^^^^^^ 

OT me ATP/GTP-binding motif 'A' (P-loop) (see <PDOC00017 >V- 
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[0511] Consensus pattern; L-P-[RK]-6-lSTN]-[GN]-[LIVM)-V-T-R- 

[ II Vallee R.B., Shpetner H.S. Annu. Rev. Biochem. 59:909-932(1 990) 

2 C.A.. Harmiarback J.A.. Shpetner H.S.. Vallee R B Nature 347 256 261M9qn^ 

( 3] van der Bliek A.. Meyerowltz E.M. Nature 35V411-4l4(i99i) 347.256-261(1990). 

! sl SI^X'- P ■ '^y'""".^ T.- O'Hara P.J.. Stevens T.H. Cell 61:1 063-1 Q74nQor.^ 

( 5] Jones B.A.. Fangman W.L. Genes Dev. 6:380-389(1 992) ■»^/'Himmu> . 

[ 6] Amheiter H., Meier E. New Biol. 2:851 -857(1 990) 

[ 7] Staeheli P.. Pitossi F.. Pavlovic J. Trends Cell Biol. 3:268-272(1993). 

[0512] 161. (dynamln_2) Dynamin central region 
rnernbrane(H.)ATPasesTeview^^^^^^^^ 

(K+. H+) ATPases (proton pump) - Calc urnTaiTArSl^ •^^^ ^^Pases (sod.um pump) [reviewed in 5.6]..Gastric 
the endoplasmic reLum^ETL^S^^^^^^^^^^ 

volved m two human genetic disorders- Menkes -svnHrnmo »a/ i }^"**> 7 ^^ses (copper pump) which are in- 
■ Bacterial cadmium efflux (Cd^TATPaS [SedTn sT 

cation ATPase from Leishmania - fixl a DrSSLTnn atp " '.'"^Snesium (Mg++) ATPases. - A probable 

The region around the phosrhorylated^^^^^^^ '"^°"T "«^°38n fixatton. 

as a signature pattern. ^ ^^"^'^'^ consented in all these ATPases and can be used 

[0515] Consensus pattern: D-K-T-G-T-[LI].rni [D is phosphorylated] 

[1] Green N.M.. McLennan D.H. Biochem. Soc. Trans. 17:819^22(1989) 

1 2] Green N.M. Biochem. Soc. Trans. 1 7:970-972(1 989) 

[ 3] Fagan MJ.; Saier M.H. Jr. J. Mol. Evol. 38:57-99(1994) 

[ 4] Serrano R. Biochim. Biophys. Acta 947: 1 -28(1 988). 

[ 5] Fambrough D.M. Trends Neurosci. 11:325-328(1988). 

[ 6] Sweadner K.J. Biochim. Biophys. Acta 988:185-220(1989) 

[ 7] Bull RC. Cox D.W. Trends Genet. 10:246-251(1994) 

[ 8] Silver S.. Nucifora G.. Chu L. Misra TK. Trends Biochem. Sci. 14:76-80(1989). 

[0516] 163. E1_N 

E1 Protein, N terminal domain 

Number of members: 90 

[0517] 164. (E1_dehydrog) Dehydrogenase El component 
[0519] 165. (ECH) Enoyl-CoA hydratase/isomerase signature 

Enoyl-CoA hydratase (EC 4.2.1.17) (ECH) [1 J and 3-2«ians-enoyl-CoA isomerasefEC 5 3 rpr.^ ro, . 
zymes involved in fatty acid metabolism ECH catalvzes th« hwHrL, '^"^^^^^^(EC &338) (ECl) [2] are two en- 
and ECl shifts the 3- Lble bond oMhe inte^S^^s of m^^^^ 2-trans;enoyl-CoA into 3-hydroxyacyl-CoA 

eukaryotic cells have two fatty-acid bSa-oxlStsllr^ f^^^ ^'"^ '° ^-trans position. Most 

In mitochondria. ECH and ECl are seTaSniTstSrTS: . T ^ °ther in peroxisomes. 

«.nalenzyme[,conS.rg:rar^^^^^^^ 

a3-hydroxyb^Jl-CoA ep meSse do^^^^^ fnumbl r 7""""^="°"^' ^vhich contains both a HCDH and 

tctheEcSESre^mes^dSf-TS^^^^^^^^^ 

involved in the bufyratertjutanol-producinq oathwav ^hihlZl 1 lir^^ (crotonase). a bacterial enzyme 
menB)(5Labacter.,enzymeinLedTnrb'St^^^^^^^^ 

O-succinyl-benzoyl-CoA (OSB-CoA> to 1 4^ihvrirnvx, o^o^k.k _ (vnamin K2). DHNA synthetase converts 
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richin glycine and hydrophobic residues was selected. 

[ 1] Minami-lshii N.. Taketani S., Osumi T, Hashimoto T. Eur. J. Biochem 185-73-78(1989) 
[ 2] Mueller-Nowen G.. Stoffel W. Biol. Chem. Hopps-Seyler 372:61 3-624(1 991 ). 
[ 3] Palosaarl P.M., Hiltunen J.K. J. Biol. Chem. 265:2446-2449(1990). 
[ 4] Nakahigashi K., Inokuchi H. Nucleic Acids Res. 18:4937-4937(1990). 
[ 5] Driscoll J.R., Taber H.W. J. Bacteriol. 174:5063-5071(1992). 

[6] Babbitt P.C.. Kenyon G.L.. Matin B.M.. Charest H., Sylvestre M.. Scholten J.D., Chang K-H Lianq P-H 
Dunaway-MarianoD. Biochemistry 31:5594-5604(1992). 
{ 7] Beckman D.L., Kranz R.G. Gene 107:171-172(1991). 

[8] Elchler K.. Bourgis F.. Buchet A.. Kleber H.-P, Mandrand-Berthelot M.-A. Mol. Microbiol. 13:775-786(1994). 
[0521] 1 66. (EF1 BD) Etengation factor 1 betaAieta'/delta chain signatures 

Eutejyotic elongation factor 1 (EF-1 ) is responsible for the GTP^ependent binding of aminoacyl-tRNAs to the ribos- 
f h K. T""" . ^'P^^ ^'""^ ^^"^^ ^'^'^ aminoacyHRNAs, the gamma 

chains. The beta and delta chains are highly similar proteins that both stimulate the exchange of GDP bound to the 
alpha Cham for GTP [2). The beta and delta chains are hydrophilic proteins of around 23 to 31 Kd The^^S ter^^ina! 
part seems .mportant for the nucleotide exchange activity, while the N-terminal section is probably invoivedT the 

rlTT '^'^ '^""'^ °^ P-'«i"« ^-etoped Z first co re 

K to the C-temiinal extremity of these proteins 

[0522] Consensus pattern: [DE]-[DEG]-[DE](2)-[LIVMF]-D-L-F-G- 
Consensus pattern: [IV]-Q-S-x-D-[LIVM]-x-A-[FWM]-[NQ]-K-[LIVM]- 

[ 1] Riis B., Rattan I.S., Clark B.FC. Merrick W.C. Trends Biochem. Sci 15 420-424(1 990) 

loS'?S7(i99j)^' ^" '^"^^^ ^'°p^y=- 

[0523] 1 67. (EF1 G_domain) Elongation factor 1 gamma, conserved domain 
[0524] 1 68. (EFG_C) Elongation factor G C-terminus 

S nSoJTnn tt^r' «'«'«y«!°"nd assocBted with GTP EFTU. This family includes the carboxy! terminal regions 
of Elongation factor G, elongation factor 2 and some tetracycline resistance proteins 
[0526] 169. (EFP) Elongation factor P signature 

Elongation factor P (EF-P) [1] is a prokaryotic protein translation factor required for efficient peptide bond synthesis 

w2 selSed ^ "^"^'"'^ ' """"'""^ ^'9'°" ''^ ^-to'^*"^' of these proteins 

[0527] Consensus pattem: K-x-[AV]-x(4)-G-x(2)-(LIV]-x-V-P-x(2)-[LIV]-x(2)-G- 
[ 1] Aoki H., Adams S.-L,. Turner M.A., Ganoza M.C. Biochimie 79 7-11 (1997) 
[0528] 1 70. (EF TS) Elongatran factor Ts signatures 

wHh 2lr??T? 'r nT'°" 'f*"' VT''"^ ^ °* '""^ -^y^'^ °' P^°<«'" biosynthesis. It associates 

wrth the EF-Tu.GDP complex and induces the exchange of GDP to GTP, it remains bound to the aminoacyl-tRNA EF 

ciJnL rTJ " """^"'^ genome of some algal chloroplast [2]. It is also present in mrto- 

been s^i ' ^"^'""^ P^^ °' protein JTavt 

S P^"^"'- L-R-x(2)-T-[GSDNQ]-x-(GS]-[LIVMF]-x(0,1 )-[DENKAC]-x-K-[KRNEQS)-A-L- 

[0530] Consensus pattem: E-[LIVM]-(NV)-[SCV]-(QE]-T-D-F-V-[SAJ-[KRN]- 

[ 1] Bubunenko M.G., Kireeva M.L.. Gudkov A.T Biochimie 74:419^25(1992) 
[ 2] Kostrzewa M., Zetsche K. Plant Mol. Biol. 23:67-76(1 993). 

[ 3] Xin H.. Woriax V.L, Burkhart W.A., Spremulli LL J. Biol. Chem. 270: 17243-1 7249( 1 qq.'?) 
[0531] 171. (EMP24_GP25L) emp24/gp25L/p24 family 

l^!^^yr^^V^ '^"^ ^'^ implicated in bringing cargo forward from the ER and binding to coat proteins by 
their cytoplasmic domains. Number of members: 30 a H'o'ems oy 
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! cK.vf^'^"'^ ^"""^^ Bergeron JJ. Nilsson T. J Cell Biol 1 998;140:751 -765 
172. ENV jDolyprotein 

ENV potyprotein (coat polyprotein) 

Number of members: 224 

[0534] 173. (ERG4_ERG24)Er90Sterol biosynthesis ERG4/ERG24 family signatures 

IriSl'"T°' '"°«y"'f'«=« and whi* act by reducing double bonds in precursors of 

[0535] Consensus pattern: G-x(2)-(LIVM]-(YH]-D-x-(FYWl-x-G-x(2)-L-N-P-R- minai sectton. 

Consensus pattern: ILIVM](2)-H-R-x(2)-R-D-x(3)-C-x(2)-K-Y-G- 

^74 ^tZ'f 'T'l ''-':-- '^'^''^"^^^ J-P- G°«bl M.. Carter G.T., Kirsch D.R. Gene 140:41^9(1994) 
[0536] 1 74. (ERM) Eznn/radixin/moesin family ^ ^* 

tHt ol^llTpSnr ' ^ '-ily represents 

SSfl! l7.r°pm ^' ^' ^' T^"'*"^ S. J Cell Biol 1998:140-885-895 

[0539] 1 75. ER lumen protein retaining receptor signatures 

E-L oi H S? n I'tl'lT °' '^^f "'^°P'««'"'c reticulum (ER) contain aC-tem,inal tetrapeptide (generally K-D- 

Th.s»onapane..ap,te«,cons.™edd<^ep«te,«co™»on*,omec^^ 

[0540] Consensus pattern: G-l-S-x-[KRJ-x-Q-x-L-[FY]-x-[LIVl(2)-F-x(2)-R-Y- 
Consensus pattern: L-E-[SA]-V-A-I-[LM]-P-Q.L- 

[ 1) Pelham H.R.B. Curr. Opin. Cell Biol. 3:585-591(1991). 

[ 2] Townsley FM.. Wilson D.W., Pelham H.R.B. EMBO J. 12:2821-2829(1993). 

[0541] 1 76. (ETF_beta) Electron transfer flavoprotein beta-subunit signature 

Il7j!f flavoprotein (ETF) [1 .2] sen/es as a specific electron acceptor for various mrtochondral dehvdro 

^r^rn^-hrprrs.::^^^^^^^^^ 

[0542] Consensuspat.em:[IVA]-x.[KR].x(2)-[DE)-[GDHGDE>x(1.2)-IEQ]-x-[UV]-x(4)-P-MuJ^^^^^^^^^ 

! t"°^^^'1^ - 'I'^^ ^' "° ^^"^^ ^ 321:637-652(1990) 
[ 2] Tsai M.H.. Saier M.H. Jr. Res. Microbiol. 146:397-404(1995). 

[0543] 1 77. Endonuclease III signatures 
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' LhOahlT Nucteic Acids. Hes, 3307-3312(1996) 
141 Nootog J , ^ Eedm F.J.M.. Egg™, RIL, d, to w.„. Nucleio Aci« SU,.6507(1992,. 

!^ I^P""^) <lopsnd.nl .pimerase/dehydralass lamlly 

'SILm^ '^'''"'•'S ^^l"^'' ""V P*. HoWen HM. B»cl»ml.„y ,997;36: 

[0548] 179. Exonuclease 

[05S0J [1 ] KocOTin EV, Deutscher MP. Nucleic Acids Res 1 993 21 2521-2522 
[0551] 180. ENTH .-^'-^o^^i ^o/^. 

ENTH domain 

[0SS2] [1 ) Kay BK, Yamabhai M, Wendland B, Emr SD leadline QQi «wnnQ < 
[0554] Number of members: 29 

[0555] 181. (elF-IA) Eukaryotic initiation factor 1A signature 

proteins was selected. ^' ^ ^ ^^9'°" '^^ '^e"*'^' section of these 

I M i n i'i T""^ ■ ^^^^^^y-^ W B. J. Biol. Chem P7n...7«JoUoVn..^ 

[0558] 182. (elF-5A) Eukaryotic initiation factor 5A hypusine signature 

Eukaiyotic initiation factor 5A (elF-5A) (formerlv known as elF-4ni fi oi ■= = . • ^ 

to the function^ elF 5A.Thypure^S^^^^^ °' "^^ '^^""^'"^ 9'°"P 

c^Idarius or MethanococclTnT^r^^^^^^^^^ 

b^ynth^is.Thesign^ 

[0559] Consensus pattern: [PT]-G-K-H-G-x-A-K [The first K is modified to hypusin J 
( 1] Park M.H.. Wolff E.G., Folk J.E. Biofactors 4:95-104(1993) 

[ 2] Schnier J.. Schwelberger H.G.. Smit-McBrkle Z.. Kang H.A.. Hershey J.W.B. Mol. Cell. Btol. 11:3105-3114 
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[0560] 183. (efhand) S-100/ICaBP type calcium binding protein signature 

The vitanin-D dependent intestinal cateiulrnrng rSS S^^^^^^ r ^^"'^^ 

proteins, but it does not form dimers. In the past years the Li, ^ Z ^ ^ this family of 

detemiined (tor reviews see (2.3.41)- in mifiseMhe fL^^^^^ '""""^'^ °' '^'^ '^'"''V "^^^^ "een 

coming clearthat they are JoWe6 1 ceClrtTand dfff e^^^^^^^ cTJh "1"^' » 
proteins are: - amerentiation. cell cycle regulatron and metabolic control. These 

Pits SrC™^^^^^^ «^°°-«)- - Ca-P-n . light chain (piO; 

Calgranulin B (MIF related'proteh U (£ fr^ir^^^^^^^^ ' ^^O^^^S). - 

calcium-binding protein (CAPL) (18a2- peL98 42a ^K- M^^^^^^^ ' <^^'9'""3"" (S100C). - Placental 

Protein S-IOOE (S100A3). - Pritein S m^^^T'sm^rT I V^^'^' ' ^^^^^^^S) " 

(S100A7). - Chemotactic cytokine CP-10 (5J pVotlin MRP S 16, tT.'T? ' 
filament-associatedprotein thatassociates ih kl?!l f f I" ^'■^'^^^^V^''" I^I This is a large intermediate 

ir. ns N-tem,ina. extrem«y. A numbero^ t^^e^ ^^^^ (^'^.f " «-tains a S- 100 type domain 

Our EF-hand detecting pattern will fail to oickE nrTtlc I k f '""^ "'"'^ (P'Ofof example), 

was developed which'unambigrusTpic^f up^^^^^^^^ 

EF-hand high affinity site but makes no assumD^r^ fhi .T^ to this family Th.s pattern spans the regton of the 

[0561] consensus pattern: [LI VMFySSh^^^^^^ °' '"'^ 

[LIVMF] ' '^^'''^^^t'-'^J-°-''<3)-[DN]-x(3)-[DNSG]-[FY]-x-[ES]-[FYVC]-x(2)-ILIVMFS]- 

[ 1] Baudier J. (In) Calcium and Calcium Binding proteins Gerdav r Rr.iiio i r.„ r, r- 

Verlag, Berlin, (1988). ngproieins. Gerday C, BolhsL.GillerR , Eds.. ppi02-ll3, Springer 

[ 2] Moncrief N.a. Kretsinger R.H.. Goodman M. J. Mol. Evol. 30-522-562(1990) 
3 Khgman D., H,lt D.C. Trends Biochem. Sci. 13:437-443(1988) ^^^'^^^ 

1 6] Nakano t. Graf T OncogieT527-5S(1 992) ' ■ "'"^ ^""^ ^'^^'^ 267:7499-7504(;992). 

[7] Lee S.-C. Kim l.-G.. Marekov L.N.. O'Keefe E.J., Parry D.^D., Steinert PM.. J. Biol. Chem. 268:12164-121 76 

EF-hand calcium-binding domain 

residue al*aj„lic=l K»P "^l«d on both side bv a twelve 

rat.n. The Six residues invoUd^n^h^ar^^^^^ 

Y. Z, -Y, -X and -Z. The invariant Glu or Asp arpositionT2 n,^viri«'.!' f ' '^"^"^^ ''V X. 

Listed below are the proteins which are k^o,^ ?o c^ain EF^nd r^^J'!'"' '^^'^'"'^ ''""^"'^'^ "9^"^)- 
of EF-hand regions known or supposed to exrishdfcSrd betw«i ^ °' ""^b^^ 

Which Clearly have lost their calciL-btding pr^^^^^^^^ 

found in the S-100/ ^ Properties, or the atypical low-affinity site (which spans thirteen residues) 

ICaBP family of proteins [6]. 

- Aequorin and Renilla luciferin binding protein (LBP) (Ca=3) 
• Alpha actinin (Ca=2). - Calbindin (Ca=4) 

- Calcineurin B subunit (protein phosphatase 2B regulatory subunit) (Ca-4) 

- Cateium-binding protein from Streptomyces erythLus (Ca=3'>) 
Calcium-binding protein from Schistosoma mansoni (Ca=2?) 

Calcium vector protein from amphoxius (Ca=2). 
Calcyphosin (thyroid protein p24) (Ca=4?). 
Calmodulin (Ca=4, except in yeast where Ca=3) 
Cajpain small and large chains (Ca=2). - Galretinin (Ca=6) 
Calcyclin (prolactin receptor associated protein) (Ca=2) 
Caltractin (centrin) (Ca=2 or 4). 

Cell Division Control protein 31 (gene CDC31) from yeast (Ca^2?). 
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- Diacylglycerol kinase (EC 2.7.1 .107) (DGK) {Ca=2) 

- FAD^ependent gVcarCS-phosphate dehyd«,genase (EC 1.1.99.5) from mammals (Ca=l). - Fi.b„„ (p,as,in) 

- Flagellar calcium-binding protein (1f8) from Trypanosoma cruzi (Ca=1 or 2) 

- Guanylate cyclase activating protein (GCAP) (Ca=3) 

- Inositol phospholipid-specific phospholipase C isozymes gamma-1 and delta-l rca-2l fioi infoc,i« . . • 
binding protein (ICaBPs) (Ca=2). m « i «aia aena i (t.a-Z) [10]. - Intestinal calcium- 

- MIF related proteins 8 (MRP^ or CFAG) and 14 (MRP-14) (Ca=2) 

- Myosin regulatory light chains (Ca=1 ). - Oncomodulin (Ca=2) 

- Osteonectin (basement membrane protein BM^) (SPARC* anri nmtain^ fh». 

(QR1. matrbc g^coprotein SCI) (see'the ent^^.Pl^cS>)^^^^^^^^^ an •osteonectin^ domain 

- Placenta, calcium-binding protein (18a2, ^nsZe growth facStduc^d ^^^^^^ '^'^ 
■ Recovenns (visinin. hippocalcin, neurocalcin, S-modulin) (Ca=2 to 3) ^ 

- Reticutocalbin (Ca=4). - S-100 protein, alpha and beta chains (Ca=2) 

- barcoplasmic calcium-binding protein (SCPs) (Ca=2 to 3) 

- Sea urchin proteins Spec 1 (Ca=4), Spec 2 (Ca=4?), Lps-'l (Ca=8) 

" f^rCLXIJnTl,' • 

- Squidulin (optic lobe calcium-binding protein) from squid (Ca=4) 

- Troponins C; from skeletal muscle (Ca=4), from ca«iiac muscle (Ca=3), from arthropods and molluscs (Ca=2). 

mi::atw;:ar:agr^^^^^^ 

pattern was dUlopLS^fch tals iracco^^^^^^^^^^^ 
loopaswellastheSrstresiduewhic^h^ir^^^ 

■ ^"mFyS? °-''-'°^^HILVFYW}-[DENSTGHDNQGHRKHGP,-[L.VMCHDENQSTAGC]-x(2HDE,- 

- Note, positions 1 (X). 3 (Y) and 12 (-Z) are the most conserved 

- Not. the pattern will, in some cases, miss one of the EF-hand reg«>ns in some proteins ^^th murtiple EF-hand 

^^^^-^^"^^^^^^ t B.H. cold spring Harbor Symp. 

[ 3] Moncrief N.D.. Kretsinger R.H.. Goodman M. J. Mol. Evol. 30:522-562(1990) 

4 Nakayama S.. Moncrief N.D.. Kretsinger R.H. J. Mol. Evol. 34 416Si992, 

5 HeizmannC.W.. HunzikerW. Trends Bkx:hem. Sci. 16:98-lS(l997) ^ 
[ 6] Kligman D. . Hilt D.C. Trends Biochem. Sci. 1 3 437-443(1 988) 

! 2 M "^"f 1^ o -^^^^ 58:951-98(1989) 
[ 8] Haiech J.. Sallantin J. Biochimie 67:555-560(1 985) 

[ 9, Chauvaux S.. Beguin R. Auber, ..-P. Bha. K.M.. Gow L.A.. Wood TM.. Bairoch A. Biochem. J. 265:261-265 
[10] Bairoch A.. Cox J. A. FEBS Lett. 269:454-456(1990). 
[0562] 184. Fnolase signature 

M^?tli°?'°o'"^ ['-'VK3)-K-x-N-Q-l-G-IST]-[LIV]-[STJ-IDEHSTAl 
J W t ^-f '"" J- Biol. Chem. 264:3685-3693(1989) 

[ 2] Wistow G.. Piattigorsky J. Science 236:1554-1556(1987) 
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[0564] 185. (F-actin_cap_A) F-actin capping protein alpha subunit signatures 

[0585] Consensus pattern: V-H-[FY](2)-E-D-G-N-V 
Consensus pattern: F-K-[AE]-L-R-R-x-L-P- 

[0566] [1]CooperJ.A.,CaldwellJ.E..GattermeirDJ., Torres MA Amatruda J F C«<;flii« i p r.n m^v. . . 

eton 18:204-21 4(1 991 ). nmairuaa j.h. casella J.R Cell Motil. Cytoskel- 

[0567] 186. F-box domain 

[0569] 187. F-protein 
Negative factor, (F Protein) or Nef 

=™~S-»— ^^^^ 

[0573] 188. (FAD_binding_2) 

[0574] In eukaryotes mitochondrial succinate dehydrogenase (ubiquinone^ ^EC 1 i in^ 

two subunits: a FAD flavoprotein and and iron-sulfur proLin ^ ^"^"'^ ^""^Posed of 

[0577] 189. Fatty acid desaturases signatures (FA_de8aturase) 

selected ^ "^^""^"^^^ ^^S'^ of these enzymes was 

[0578] Consensus pattem: G-E-x-[FY]-H-N-[FY]-H.H-x-F-P-x-D-Y- 
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Consensus pattern: [STl-(SA)-x(3)-[QRHLI]-x(5.6)-D-Y-x{2)-[LIVMFYW]-{LIVM]-{DE]- 

! J! TT' "^^"y "^ D- J- Biol- Chem. 264:14755-14761(1989) 

2 Shanklin J.. Somerville C.R. Prc«. Natl. Acad. Sci. U.S.A. 88:2510-2514(1991) 
1 3] Wada H.. Gombos Z.. Murata N. Nature 347:200-203(1 990). 

[0579] 1 90. Fructose-1 -6-bisphosphatase active site (FBPase) 

wmwmmmM 

[ 1] Benkovic S.J.. DeMaine M.M. Adv. Enzymol. 53 45-82(1982) 

[slSriho'!,;.'-'? m'?' r^'p^T ''^^^ Dy«^T.A. Eur. J. Biochem. 205:1053-1059(1992) 
[ 3] Ke H.. Thorpe CM., Seaton B.A., Lipscomb W.N.. Marcus F. J. Mol. Biol. 212:513-539(1989). 

[0580] 191. FGGY family of carbohydrate kinases signatures * 

[0581] Consensus pattern: [MFYGS]-x-[PST]-x(2)-K-[LIVMFYW]-x-W-[LIVMF]-x [DEiStS^^ fENQHI 

[ 4] Fischer G., Schmid FX. Biochemistry 29:2205-2212(1990) 

[5] Trandinh C.C.. Pao G.M.. Saier M.H. Jr. FASEB J. 6:3410-3420(1992). 
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[ 6] Galat A. Eur. J. Biochem. 216:689-707(1993). 

[ 7] Hacker J., Fischer G. Mol. Microbiol. 10:445456(1993). 

[8] Wuelfing C. Lomardero J., Plueckthun A. J. Biol. Chem. 269:2895-2901(1994). 

[0586] 1 93, MAPEG family (aka: FLAP/GST2/LTC4S family signature) 
[0587] The following mammalian proteins are evolutionary related [1]: 

. Leukotriene C4 synthase (EC 2.5.1 .37) (gene LTC4S). an enzyme that catalyzes the productbn of LTC4from LTA4 
■ 91^^^^ S-transferase II (EG 2.5.1.18) (GST-II) (gene GST2). an enzyme that can also produces 

L I U4 iron LTA4. 

- 5-lipoxygenase activating protein (gene FiJVP). a protein that seems to be required for the activation of 5-lipoxy- 

genase. 

[0588] These are proteins of 1 50 to 1 60 residues that contain three transmembrane segments. As a signature pattern 
a conserved region between the first and second transmembrane domains was selected 
[0589] Consensus pattern: G-x(3)-F-E-R-V-[l^-x-A-[NQ]-x-N-C 

[0590] [1] Jalcobsson P.-J., Mancini J.A.. Ford-Hutchinson A.W. J. Biol. Chem. 271:22203-22210(1996) 
[0591] 1 94. FMN-dependent alpha-hydroxy acid dehydrogenases active site (FMN_dh) 

A number of oxidoreductases that act on alpha-hydroxy acids and which are FMN-containing flavoproteins have been 
shown [1,2,3] to be structuralV related; these enzymes are: - Lactate dehydrogenase (EC 1.1.2.3 1 which consists of 
a dehydrogenase domain and a heme-binding domain called cytochrome b2 and which c^J^s the conversion of 
lactate into pyruvate. - Glycolate oxidase (EC 1.1.3.15) ((S)-2-hydroxy-acid oxidase), a peroxisomal enzyme that cat- 
alyzes the conversion of glycolate and oxygen to glyoxylate and hydrogen peroxide: - Long chain alpha+ydroxy acid 
oxidase from rat (EC naiS). a peroxisomal enzyme. - Lactate 2-monooxygenase (EC 1.13.12.4 > (lactate oxidase) 
from Mycobacterium smegmatis. which catalyzes the conversion of lactate and oxygen to acetate, carbon dioxide and 
water. - (S)-mandelate dehydrogenase from Pseudomonas putida (gene mdlB), which catalyzes the reduction of (S)- 
mande ate to benzoylformate. The first step in the reaction mechanism of these enzymes is the abstraction of the 
^'^I'^I JT ^'P^^-^3*°" °f substrate producing a carbanion which can subsequently attach to the N5 atom 
of FMN. A conserved histidine has been shown [4] to be involved in the removal of the proton. The region around this 
active site residue is highly conserved and contains an arginine residue which is involved in substrate bindinq 
[0592] Consensus pattem: S-N-H-G-[AG]-R-Q [H is the active site residue] [R is a substrate-binding residue]- 

[ 1] Giegel D.A., Williams C.H. Jr.. Massey V. J. Biol. Chem. 265:6626-6632(1990) 

(fggoT" " ^^^'^ ' ° ° ' ^^^"'^ " ° ^'~=^«'"'«*'y 29:9856-9862 

J 3] Le K.H.D., Lederer F. J. Biol. Chem. 266:20877-20880(1991). 
[ 4] Lindqvist Y, Branden C.-l. J. Biol. Chem. 264:3624-3628(1989). 

[0593] 195. Flavin-binding monooxygenase-like (FMO-like) 

[0594] This family includes FMO proteins, cyclohexanone monooxyqenase 

[0595] 196. (FPGS) 

Folylpolyglutamate synthase signatures (aka Murjigase) 

[0596] Folylpolyglutamate synthase (EC 6.3.2. 17) (FPGS) [1] is the enzyme of folate metabolism that catalyzes ATP- 
dependent addition of glutamate moieties to tetrahydrofolate. 

[0597] Its sequence is moderately conserved between prokaryotes (gene folC) and eukaryotes. We developed two 
signature patterns based on the conserved regions which are rich in glycine residues and could play a role in the 
calalytical activity and/or in substrate binding. 

[0598] Consensus pattem [LIVIWFY]-x-[LIVM]-[STAG]-G-T-[NKhG-K-x-(ST]-x(7)- (LIVM](2)-x(3)-[GSK] Sequences 
known to belong to this class detected by the pattem ALL. 

[0599] Coiisensus pattemtLIVMFY](2)-E-x-G-[LIVM]-[GAl-G-x(2)-D-x-[GST]-x-(LIViy/l](2) Sequences known to be- 
long to this class detected by the pattem ALL. m 

^ ^ ' ^^^^ ^ ' ^ ■ • ^--^ ■ -^^^ Stover P Adv. Exp. Med. Biol. 338:629-634 

[0601] 1 97. FYVE zinc finger 

(0012] The FYVE zinc finger is named after four proteins that it has been found in: Fabl. YOTB/ZK632 12 V&cl 
and EEA1. The FYVE finger has been shown to bind two Zn++ ions [1]. The FYVE finger has eight potential zinc 
coordinating cysteine positions. Many members of this family also include two histidines in a motif R+HHC+XCG where 
+ represents a charged residue and X any residue. Members were included which do not conserve these histidine 
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residues but are clearly related. 

[06031 m Stenmark H. Aasland R, Toh BH. D'Arrigo A. J Biol Chem 1996-271 24048-24054 r21 Gaullfer im 
monsen A. D-Arrigo A. Bremnes 8. Stenmark H. Aasland R. Nature 1998 394^23^ ^ ' ' 

[0604] 198. F_actin_cap_B 
F-actin capping protein beta subunit signature 

not sever acfn filaments. The F-act.n capping protein Is a heterodlmer composed of two unrelated subunits: alpha and 

OMfii ^onsensus pattem^C-D-Y-N-R-D Sequences known to belong to this class detected by the pattern ALL 

K 1 f • ^ ■ ^P^^ Nature 344:352-354(1990) 

[0609] 199. IsopenicillinN synthetase Signatures (Fe_Asc_oxidored) 

Isopenicillin N synthetase (IPNS) [1 .2] is a key enzyme the biosynthesis of penicillin and ceohalosDorin In th« nro= 

[0610] Consensus pattem: [RK]-x-[STA]-x(2)-S-x-C-Y-[SL]- 
Consonsuspattem: [LIVM](2)-x-C-G-[STA]-x(2)-[STAG]-x(2)-T-x-[DNG]- 

[1] Martin J.F. Trends Biotechnol. 5:306-308(1987). 

^^I'.^T" ° ' M.. Aharonowitz Y. Trends Biotechnol. 8:105-111(1990) 

[4] Kovacevic S., Weigel B.J.. Tobin M.B.. IngoliaTD., Miller J.R. J. Bacterbl. 171:754-760(1989)-. 
[0611] 200. Fibrillarin signature 

Fibrillarin [1] is a component of a nucleolar small nuclear ribonucleopr6tein(SnRNP) narticla fho.mht tn r«rti.i t • 

[0612] Consensus pattem: [GST]-[LIVMAPJ-V-Y-A-[IV]-E-(FY]-(SA]-x-R-x(2)-R-IDE]- 

[ 1] Aris J.R, Blobel G. Proc. Natl. Acad. Sci. U.S.A. 88:931-935(1991) 

[ 2] Bandziulis R.J., Swanson M.S.. Dreyfuss G. Genes Dev 3 431 -437(1 989) 

( 3] Agha-Amiri K. J. Bacteriol. 1 76:21 24-21 27(1 994), 

[061 3] 201 . Filamin/ABP280 repeat 

[0617] [1 1 Breton C, Oriol R, Imberty A; Glycobiotogy 1 998;8 87-94 
[0618] 203. 2Fe-2S ferredoxins. iron-sulfur binding region signature (fer2A) 
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cluster(s) and according to sequence similarHies. One of these subgroups are the 2Fe-2S ferredoxins. which are pro- 
tetns or donnains of around one hundred amino acid residues that bind a single 2Fe-2S iron-sulfur cluster. The proteins 
that are known [2] to betong to this family are listed below - Ferredoxin from photosynfhetic organisms; namely plants 
and algae where rt is located in the chloroplast or cyanelle; and cyanobacteria. - Ferredoxin from archaebacteria of 
the Halobacterium genus. - Ferredoxin IV (gene pftA) and V (gene fdxD) from Rhodobacter capsulatus. - Ferredoxin 
in the toluene degradation operon (gene xylT) and naphthalene degradation operon (gene nahT) of Pseudomonas 
putida - Hypothetical Escherichia coli protein yfaE. - The N-temnlnal domain of the bif unctional ferredoxin/ferredoxin 
r^"Tl «'ec fon transfer component of the benzoate 1 ,2-dtoxygenase complex (gene benC) from Acinetobacter 
cateMceticus, the toluene 4-monooxygenase complex (gene tmoF). the toluate 1 .2-dioxygenase system (gene xylZ) 
and the xylene monooxygenase system (gene xylA) from Pseudomonas. - The N-terminal domain of phenol hydrox- 
ylase protein p5 (gene dmpP) from Pseudomonas Putida. - The N-terminal domain of methane monooxygenase com- 
ponent C (gene mmoC) from Methylococcus capsulatus . - The C-temiinal domain of the vanillate degradation pathway 
protein vaiB in a Pseudomonas species. - The N-temiinal domain of bacterial fumarate reductase iron-sulfur protein 
(gene frdB). - The N-terminal domain of CDP-6-deoxy-3,4-glucoseen reductase (gene ascD) from Yersinia pseudotu- 
berculosis. - The central domain of eukaryotic succinate dehydrogenase (ubiquinone) iron- sulfur protein - The N- 
erminal domain of eukaryotic xanthine dehydrogenase. - The N-terminal domain of eukaryotic aldehyde oxidase In 
he 2Fe-2S ferredoxins. four cysteine residues bind the iron-sulfur cluster. Three of these cysteines are clustered 
together in the same region of the protein. Our signature pattern spans that iron-sulfur binding region 
P!?l? Consensus pattern: C-{CHC}-IGA]-{C)-C-[GAST]-{CPDEKRHFYW)-C [The three C's are 2Fe-2S ligands]- 
l^^r^^V. 3:222-226(1988).[ 2] Harayama S., Polissi A.. Rekik M. FEBS Lett. 285-85-88(199 ) 

[0620] Adrenodoxin family, Iron-sulfur binding region signature (fer2B) 

Ferredoxins [1] are a group of iron-sulfur proteins which mediate electron transfer in a wide variety of metabolic reac- 
tions. Ferredoxins can be divided into several subgroups depending upon the physiological nature of the iron sulfur 
clusters) and according to sequence similarities. One family of ferredoxins groups together the following proteins that 
a»b.nda single 2Fe-2S iron-sulfur cluster: - Adrenodoxin (ADX) (adrenal ferredoxin), a vertebrate mitochondrial protein 
which transfers electrons from adrenodoxin reductase to cytochrome P450scc. which is involved in cholesterol side 
Cham cleavage. - Putidaredoxin (PTX), a Pseudomonas putida protein which transfers electrons from putidaredoxin 
reductase to cytochrome P450-cam. which is involved in the oxidation of camphor - Terpredoxin [2]. a Pseudomonas 
protein which transfers electrons from teipredoxin reductase to cytochrome 
P450-terpwhichisinvolvedintheoxidationofalpha-terpineol.-Rhodocoxin[3].aRhodoc^^ 
electrons from rtiodocoxin reductase to cytochrome CYP116 (thcB). which is involved in the degradation of thiocar- 
bamate herbicides. - Escherichia coli ferredoxin (gene fdx) [4] whose exact function is not yet known. - Rhodobacter 
capsulatus ferredoxin VI [5], which may transfer electrons to a yet uncharacterized oxygenase. - Caulobacter crescen- 
tus ferredoxin (gene fdxB) [6].ln these proteins, four cysteine residues bind the iron-sulfur cluster Three of these 
cysteines are clustered together in the same region of the protein. Our signature pattern spans that iron-sulfur bindinq 
region. ^ 

[0621] Consensus pattern: C-x(2)-(STAQ].x-(STAMV]-C-(STA]-T-C-[HR] (The three C's are 2Fe-2S ligands]- 
[ 1] Meyer J. Trends Ecol. Evol. 3:222-226(1988) 

CheS7"419il^^^^^^^^ ' ■ « • Witney F. Lorence M.C. J. B»l. 

! ?! ^^Sy '•■ f Compemolle F. Proost P, Vanderleyden J.. De Mot R. J. Bacteriol. 177:676-687(1995) 

[ 4) TaD.T, Vickery LE. J. Btol. Chem, 267:11120-11125(1992). 

[ 5] Naud I., Vincon M.. Garin J.. Gaillard J.. Forest E., Jouanneau Y Eur. J. Biochem 222-933-939M994^ 
(6]AmemlyaKEMBL/Genbank: X51607. v /• 

[0622] 204. 4Fe-4S ferredoxins, iron-sulfur binding region signature (fer4) 

Ferredoxins [1] are a group of iron-sulfur proteins which mediate electron transfer in a wide variety of metabolic reac- 
IriLTn f^"^^"^ subgroups depending upon the physiotoglcal nature of the iron-sulfur 

cluster(s). One of these subgroups are the 4Fe-4S ferredoxins. which are found in bacteria and which are thus often 
referred as bacterial-type' ferredoxins. The structure of these proteins [2] consists of the duplication of a domain of 
twenty SIX ammo acid residues; each of these domains contains four cysteine residues that bind to a 4Fe^S center. 
A number of proteins have been found [3] that include one or more 4Fe^binding domains similar to those of bacterial- 
type ferredoxins. These proteins are listed below (references are onfy provided for recently determined sequences) - 
[0623] The iron-sulfur proteins of the succinate dehydrogenase and the fumarate reductase complexes (EC 1 3 99 1 ) 
These enzyme complexes, which are components of the tricarboxylic acid cycle, each contain three subunitTIftero- 

^ ^ "'•^"^ cytochrome. The iron- sulfur proteins contain three different iron-sulfur 
centers, a 2Fe-2S. a 3Fe-3S and a 4Fe.4S. - Escherichia coli anaerobic glycerol-3i,hosphate dehydrogenase (EC 
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10 



15 



VLJ9^) This enzyme is composed of three subunits: A. B. and C. The C subunit seems to be an iron-sulfur protein 
with two ferredoxin-like domains in the N-terminal part of the protein. - Escherichia coli anaerobic dimethyl sulfoxide 
reductase. The B subunrt of this enzyme (gene dmsB) is an iron-sulfur protein with four 4Fe-4S ferredoxin-like domains. 
- Escherichia coli formate hydrogen lyase. Two of the subunits of this oligomeric complex (genes hycB and hycF) seem 
to be ifon-sulf ur proteins that each contain two 4Fe-4S ferredoxin-like domains. - Methanobacterium formicicum formate 
dehydrogenase (EC 1.2.1.2). This enzyme is used by the archaebacteria to grow on formate. The beta chain of this 
dimeric enzyme probably binds two 4Fe-4S centers. - Escherichia coli formate dehydrogenases N and O (EC 1.2.1.2 ). 
The beta chain of these two enzymes (genes fdnH and fdoH) are iron-sulfur proteins with four 4Fe-4S ferredoxin-like 
domains. - Desulfovibrb periplasmic [Fe] hydrogenase (EC 1.18.99.1 ). The large chain of this dimeric enzyme binds 
three 4Fe-4S centers, two of which are located in the ferredoxin-like N-terminal region of the protein. - Methanobac- 
terium thermoautrophicum methyl viologen-reducing hydrogenase subunit mvhB, which contains six tandemly repeated 
ferredoxin-like domains and which probably binds twelve 4Fe-4S centers. - Salmonella typhimurium anaerobic sulfite 
reductase (EC 1 .8.1.-) [4]. Two of the subunits of this enzyme (genes asrA and asrC) seem to both bind two 4Fe^S 
centers. - A Ferredoxin-like protein (gene fixX) from the nitrogen-fixation genes locus of varbus Rhizobium species, 
and one from the Nif-region of Azotobacter species. - The 9 Kd polypeptide of chtoroplast photosystem I [5] (gene 
psaC). This protein contains two low potential 4Fe-4S centers, referred as the A and B centers. - The chloroplast frxB 
protein which is predicted to carry two 4Fe-4S centers. - An ferredoxin from a primitive eukaryote. the enteric amoeba 
Entamobea histolytica. - Escherichia coli hypothetical protein yjjW, a protein with a N-terminal region belonging to the 
radical activating enzymes family (see <PDOC00834>) and two potential 4Fe-4S centersThe pattern of cysteine res- 
20 idues in the iron-sulfur region is sufficient todetect this class of 4Fe-4S binding proteins. 

[0624] Consensus pattern: C-x(2)-C-x(2)-C-x(3).C-[PEG] [The four C's are 4Fe-4S ligands}- 

[ 1] Meyer J. Trends Ecol. Evol. 3:222-226(1988). 
[ 2] Otaka E., Ooi T J. Mol. Evol. 26:257-267(1987). 
25 [ 3] Beinert H. FASEB J. 4:2483-2492(1990). 

[4] Huang C.J., Barrett E.L J. Bacteriol. 173:1544-1553(1991). 
[ 5] Knaff D.B. Trends Biochem. Sci. 13:460-461(1988). 

[0625] 205. NifH/frxC family signatures (fer4_NifH) 

Nitrogenase (EC 1.18.6.1) [1] is the enzyme system responsible for biological nitrogen fixation. Nitrogenase is an 
oligomeric complex which consists of two components: component 1 which contains the active site for the reduction 
of nitrogen to ammonia and component 2 (also called the iron protein).Component 2 is a homodimer of a protein (gene 
nifH) which binds a single 4Fe-4S iron sulfur cluster [2]. In the nitrogen fixation process nifH is first reduced by a protein 
such as ferredoxin; the reduced protein then transfers electrons to component 1 with the concomitant consumption of 
ATP. A number of proteins are known to be evolutionary related to nifH. These proteins are: - Chloroplast encoded frxC 
(or chIL) protein [3]. FrxC is encoded on the chloroplast genome of some plant species, its exact function is not known, 
but it could act as an electron carrier in the conversion of protochbrophyllide to chlorophyllide. - Rhodobacter capsulatus 
proteins bchL and bchX [4]. These proteins are also likely to play a role in chbrophyll synthesis. There are a number 
of consented regions in the sequence of these proteins: in the N-terminal section there is an ATP-binding site motif W 
(P-loop) and in the central section there are two conserved cysteines which have been shown, in nifH, to be the ligands 
of the 4Fe-4S cluster Two signatures patterns that correspond to the regions around these cysteines were developed 
[0626] Consensus pattern: E-x-G-G-P-x(2)-[GA]-x-G-C-[AGI-G [C binds the iron-sulfur center}- 
Consensus pattern: D-x-L-G-D-V-V-C-G-G-F-[AG]-x-P [C binds the iron-sulfur center]- 

45 [ 1] Pau R.N. Trends Biochem. Sci. 14:183-186(1989). 

[ 2] Georgiadis M.M., Komiya H., Chakrabarti R, Woo D., Kornuc J.J., Rees D.C. Science 257:1653-1659(1992). 
[ 3] Fujita Y, Takahashi Y, Kohchi T. Ozeki H., Ohyama K., Matsubara H. Plant Mol. Bbl. 13:551-561(1989) 
[ 4] Burke D.K, Alberti M.. Hearst J.E. J. Bacteriol. 175:2407-2413(1993). 

^0 [0627] 206. Ferritin iron-binding regions signatures 

Ferritin [1 ,2] is one of the major non-heme iron storage proteins. It consists of a mineral core of hydrated ferric oxide, 
and a multi-subunit protein shell which engbbes the former and assures its solubility in an aqueous environment. In 
animals the protein is mainly cytoplasmic and there are generally two or more genes that encodes for closely related 
subunits (in nnammals there are two subunits which are known as H(eavy) and L(ight)). In plants ferritin is found in the 
chloroplast [3].There are a number of well conserved region in the sequence of f en-itins. Two of these regions to develop 
signature patterns were selected. The first pattern is located in the central part of the sequence of ferritin and it contains 
three conserved glutamate which are thought to be involved in the binding of iron. The second pattern is located in the 
C-tenminal section, it corresponds to a region which fomr^ a hydrophilic channel through which small molecules and 
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li2^dsJ-^°"''"'"' "^""'"^ E-x-IKR]-E-x(2).E-[KR]-(LF]-ILIVMA]-x(2)<3-N-x-R-x-G-R [The 3 E's are potential iron 

[ 1] Crichton R.R.. Charloteaux-Wauters M. Eur. J. Biochem. 164:485-506(1987) 
[ 2] Theil E.C. Annu. Rev. Bicx:hem. 56:289-315(1987). 

( 3] Ragland M.. Briat J.-R, Gagnon J.. Laulhere J.-R. Massenet O.. Theil E.C. J. Biol. Chem. 265:18339-18344 
[0629] 207. Intermediate filaments signature (filament) 

Intemnediate filaments (IF) [1,2,3] are proteins which are primordial components of the cytoskeleton and the nuclear 

famy of proteins which has been subdivided in five major subgroups: - Type I: Acidic cytokeratins Type If lii^ 
cytokeratms. - Type III: Vimentin. desmin. glel fibrillary acidic protein (gSp). peripherJ^, and ptestic^^ TypVfv 
Neuroftl^ents L. H and M, alpha-intemexin and nestin. - Type V: Nuclear lamins A. B1 , B2 and C Al prote^are 
struc ura% similar in that they consist of: a central rod domain comprising some 300 to 350 residues wh ch's^^ana^ 
incoiled-coileda^pha-helices. With at least twosho,1characteristicinterruptions,aN-^ 

Of vanable length; and a Cterminal domain (tail) which is also non-helical, and which shows extreme le^^arS 
between drfferent IF proteins. While IF proteins are evolutionary and structurally related, they have limited sequer^^ 

domain was used as a sequence pattern for this class of proteins. 
[0630] Consensus pattern: [IV]-x-[TACI]-Y-[RKH]-x-[LM]-L-[DE]- 

[ 1] Quinlan R., Hutchison C, Lane B. Protein Prof. 2:801-952(1995). 
[ 2] Stelner P.M., Roop D.R. Annu. Rev Biochem. 57:593-625(1988). 
[ 3] Stewart M. Curr. Opin. Cell Biol. 2:91-100(1990). 

[0631] 208. Flavodoxin signature 

o^^^STw mn!I'T^ are electron-transfer proteins that function in various electron transport systems. Flavodoxins bind 
fer^rt™ '""ir ^ redox-active prosthetk: group. Flavodoxins are f unctionalty interchangeable w!h 
ferredoxins They have been isolated from prokaryotes, cyanobacteria, and some eukaryotic algae The signaru e 

[0632] Consensus pattern: [LIV]-[LIVFY]-(FY]-x-[ST]-x(2)-[AGC]-x-T-x(3)-A-x(2)-[LIVl- 
1] Wakabayashi S.. Kimura K.. Matsubara H., Rogers LJ. Biochem. J. 263:981-984(1989) 
[0633] 209. Growth factor and cytokines receptors family signatures (fn3) 

^eZomd hToTto ':^T'°P°«'"' 9^°^^ hormone-related molecules have 

been found [1 to 5] to share a common binding domain. Receptors known to belong to this family are- - Cytokine 
receptor common beta chain. Thfe chain is common to the IL-3. IL-5 and GM-CSF receptors. - CytXe reCtor 

TeZ (SNTprn'^Enillrt'^'M" '^T'^'Ta ''''^ '^"^ 'L" 13 receptors. - Ciliary neS^hlc a o 
receptor (CNTFR). - Eiythropoietin receptor (EPOR). - Granuk)cyte cotony-stimulating factor receptor (G-CSFR) - 

chT f h' '^fT'"'""^""^ ^'P^^ '^^^'^ f^"^- CSFf^) - 'nterleukin 2 riceptor beta 

Cham ('L2R-beta). - lnterieukin-3 receptor alpha chain (IL3R). - lnterleukin-4 receptor alpha chain (IL4R) - Interleuki^ 

9 receptor (IL9R). - Growth homione receptor (GRHR). - Prolactin receptor (PRLR). - Thrombopoeitin receptor (TPOR) 
^^d^sZo 13n r'"f ft?' °J °' "^'^'^'"'^'"^ ligand^inding region and is about 200 am no aci^ 

trben"^:n^tszri~ 

luter XXXXXXX cytoplasmic ..-l-l---|..| --'^^^^^-'TZ^^^^ 

ii.;rthrL^fnT!L?Tr'''^j 

S^'ai ^ tryptophan-nch pattern located at the C-temiinal extremity of the extracellular regton 

[0^] Consensus pattern: C-[LVFYRl-x(7.8)-[STIVDNJ-C-x-W [The two C's are linked by a disulfide bondl- 
Consensus pattern: (STGLl-x-W-[SG]-x-W-S- ' 

1 1] Bazan J.F Biochem. Biophys. Res. Commun. 164:788-795(1989). 
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[ 2] Bazan J.F Proc. Natl. Acad. Sci. U.S.A. 87:6934^938(1990) 

S'SaT-s^oS.^"'" ^'"^"^ ^"^'^ "^^^"''^ 

[4] d'Andrea A.D., Fasman G.D., Lodish H.F. Cell 58:1 023- 1024M 989^ 

[ 5] d'Andrea A.D.. Fasman G.D.. Lodish H.F Cun-. Opin. Cell Biol. 2:648-651(1990). 

[0635] 210. Phosphoribosylglyclnamlde formyltransferase active site (formyljransf) 

synThSisTe'li^^^^^^^ formyltransferase (EC 2JJJ) (GART) [1] catalyze's the third step in de novo purine bio- 
synthesis the transfer of a fomiyl group to S'^jhosphoribosylglycinamide. In higher eukaryotes GART is part of^ 
mutt, unct^nal enzyme polypeptide that catalyzes three of the steps of purine biosynthes^. In bacte^a piSts and 
^HH 1 i ^ "'onof unctronal protein of about 200 amino^cid residues. In the Eschericha coli enzyr^e an aipartic 
acd residue has been shown to be invoK^ed in the catalytic mechanism. The region around this actrifte reX s 
well conserved ,n GART from prokaryotic and eukaryotk: sources and can be us^ as a signa^irpaZ Zma^^^^^ 
formylte trahydrofolate dehydrogenase (ECIJJJ) [2] is a cytosolfcenzyme responsible fSe S^eSle 

.K ^. ?1mI . (200 residues) is structurally related to GARTs Esche^^to^oH me 

sL re~' meth,onyl-A(fMet). The central part of fm. seems to be evolutionar/re?ated to GaS^s S 

mmoT.:r:Z.TrZ^^^^^^ x(2)-IUVn-x(6)- 

[ 1] Inglese J.. Smith J.M., Benkovic S.J. Biochemistry 29 6678-6687(1990) 
[ 2] Cook R.J., Lloyd R.S., Wagner C. J. Brol. Chem. 266:4965-4973(1991) 

[ 3] Guilton J.-M., Mechulam Y. Schmitter J.-M.. Blanquet S.. Fayat G. J. Bacteriol. 174:4294-4301(1992). 
[0637] 211. G10 protein signatures 

?h?,r,"^ "^T. '^"""^ '° '=°"s«"'ed in a Wide range of eukaryotic species 

The function of G10 is st.ll unknown. G10 is a protein of about 17 to 18 Kd (143 to 157 residues) wSs hvdSte 

t'hetr^T " " ^"^ '^^"'^ '"^^-^^^ metal-binding. As signit pl^^^^^^^^ 

these cysteine-rich segments were selected. M^iiBms, iwo oi 

[0638] Consensus pattern: L-C-C-x-(KR]-C-x(4)-[DE]-x-N-x(4)-C-x-C-R-V-P- 
Consensus pattern: C-x-H-C-G-C-[KRHhG-C-[SA]- 

S LIl'^n^T Dworkin M B.. Richter J.D. Genes Dev 3:803-815(1989) 

LU&40J 212. G-protein alpha subunit 

^® ~"P'« ^^"Pto^s °' extracellular signals to intracellular signaling pathways The G protein aloha 

subunrt binds guanyl nucleotide and is a weak GTPase. Number of members: 1 95 ^ ^ 

r'JI Snw n^rot^' ^'^^T ^' '-'"'^'^ ^"^^ SP-^^S SR, Science 1994:265:1405-1412 

[2] How G proteins work: a continuing story. Coleman DE, Sprang SR, Trends Biochem Sci 1996;21:41^. 

[0642] 21 3. Glucose-6-phosphate dehydrogenase active site (G6PD) 

Sd"uT^Tr'''^''A'^''T"^'' (ECUJJ9) (G6PD) [1] catafyzes the first step in the pentose pathway the 
^frS^ii 9'"'^^,t^^^'^P»^«'« to gluconolactone 6-phosphate. A lysine residue has been identified SarTac ^e 

Lr S TJ*"^ °' '^"^'^'^ this lysine is totally conse^Xm 

bacterial to mammalan G6PD's and can be used as a signature pattern consented from 

[0643] Consensus pattern: D-H-Y-L-G-K-[EQK] (K is the active site residue]- 

K 2;i'G;Titp™^^^ 

SIg? to^r.Sin f "'"T'f " ^™ "'"^ *° "'"sensus sequence (AmGATA 

GATA 1 ri1 r? . '^^^ °' ^ "^""^ °' 9^"^^ P^°toins currently known to belong to this fam ^a^ 

'IT H ^ (also known as Eryf 1 . GF-1 or NF-El), which binds to the GATA reg on of globin genes ar^d otSJ genes 
expressed in erythroid cells. It is a transcriptional activator which probabV serves as a general Kvi^hMactrfor p^k 

r ATA^i m '^^^"P"°"^' ^^'"'ator which binds to the enhancer of the T-cell receptor alpha and delta oenes 
- GATA-4 (4J, a transcriptional activator expressed in endodermally derived tissues and heart D^onhT^!2 

r ^^^-^n^^"^ ""'^ '''' ^ '^"^^^^ °' achaete-Lrc^X (asi ■ -'S^bCoS BCR 
[5].whichregu^testheexpressionofchorongenes.-Caenorhabdmseleganse«-1andel,.2.trL^^^^ 



104 



EP 1 033 405 A2 



of genes containing the GATA region, including vitellogenin genes [6]. - Ustilago maydis urbsl [7], a protein involved 
in the repression of the biosynthesis of siderophores. - Fission yeast protein GAF2. All these transcription factors contain 
a pair of highly similar 'zinc finger" type domains with the consensus sequence C-x2-C-x1 7-C-x2-C Some other proteins 
contain a single zinc finger motif highly related to those of the GATA transcription factors. These proteins are- - Dro- 
sophila box A-binding factor (ABF) (also known as protein serpent (gene srp)) which may function as a transcriptional 
activator protein and may play a key role in the organogenesis of the fat body. - Emericella nidulans areA [81 a tran- 
scriptional activator which mediates nitrogen metabolite repression. - Neurospora crassa nit-2 [91, a transcriptional 
activator which turns on the expression of genes coding for enzymes required for the use of a variety of secondary 
nitrogen sources, dunng conditions of nitrogen limitation. - Neurospora crassa white collar proteins 1 and 2 (WC-1 and 
WC-2), which control expression of light-regulated genes. - Saccharomyces cerevisiae DAL81 (or UGA43) a negative 
nitrogen regulatory protein. - Saccharomyces cerevisiae GLN3, a positive nitrogen regulatory protein. - Saccharomyces 
cerevisiae GAT1 . - Saccharomyces cerevisiae GZF3. 

[0646] Consensus pattern: C-x-[DN]-C-x(4.5)-[ST]-x(2)-W-[HR]-[RK]-x(3)-[GN]-x(3.4)- C-N-[AS]-C [The four C's are 

Zinc liganas] 

[ 1] Trainor CD., Evans T. Felsenfeld G.. Boguski M.S. Nature 343:92-96(1990), 

[ 2] Lee M.E., Temizer D.T., Clifford J.A., Quertermous T. J. Biol. Chem. 266:16188-16192(1991) 

[ 3] Ho l.-C. Vorhees R. Marin N., Oakley B.K.. Tsai S.-F., Orkin S.H., Leiden J.M. EMBO J. 10:1187-1192(1991) 

[ 4] Spieth J.. Shim YK. Lea K.. Conrad R, Blumenthal T. Mol. Cell. Biol, 11:4651-4659(1991) 

[ 5] Drevet J.R.. Skeiky Y.A.. latrou K. J. Biol. Chem. 269:10660-10667(1994). 

[ 6] Hawkins M.G„ McGhee J.D. J. Biol. Chem. 270: 14666-14671(1 995V 

[ 7] Voisard C.RO,, Wang J.. Xu R, Leong S.A.. McEvoy J.L Mol. Cell. Biol, 13:7091-7100(1993) 

[ 8] Arst H.N. Jr. Kudia B.. Martinez-Rossi N.M., Caddick M.X., Sibley S.. Davies R.W. Trends Genet. 5:291 -291 

(1 989). 

[ 9] Fu Y.-H., Marzluf G.A. Mol. Cell. Biol. 10:1056-1065(1990). 
[0647] 21 5. Glutamine amidotransferases class-l active site (GATase) 

A large group of biosynthetic enzymes are able to catalyze the removal of the ammonia group from glutamine and then 
to transfer this group to a substrate to form a new carbon-nitrogen group. This catalytic activity is known asglutamine 
amidotransferase (GATase) (EC 2.4.2.-) [1]. The GATase domain exists either as a separate polypeptidic subunit or 
as part of a larger polypeptide fused In different ways to a synthase domain. On the basis of sequence similarities two 
classes o GATase domains have been identified [2.3]: class-l(also known as trpG-type) and class-ll (also known as 
purF-type). Class-l GATase domains have been found in the following enzymes: - The second component of anthra- 
nilate synthase (AS) (EC MJJZ) [4]. AS catalyzes the biosynthesis of anthranilate from chorismate and glutamine 
AS IS generally a dimeric enzyme: the first component can synthesize anthranilate using ammonia rather than 
g utamine. whereas component II provkJes the GATase activity In some bacteria and in fungi the GATase component 
of AS IS part of a multifunctional protein that also catalyzes other steps of the biosynthesis of tryptophan - The second 
component of 4-amino-4-deoxychorlsmate (ADC) synthase (EC 4.1.3. -). a dimeric prokaryotic enzyme that function 
in the pathway that catalyzes the biosynthesis of para^aminobenzoate (PABA) from chorismate and glutamine The 
second component (gene pabA) provides the GATase activity [4]. - CTP synthase (EC 6.3.4.2 ). CTP synthase catalyzes 
the final reaction in the biosynthesis of pyrimidine. the ATP-dependent formation of CTP from UTP and glutamine CTP 
fVT.^^ ^ the GATase domain is in the C-terminal section 

^ . - GMP synthase (glutamine-hydrolyzing) (EC 6.3.5.2). GMP synthase catalyzes the ATP-dependent formatfon of 
GMP from xanthosine S'-phosphate and glutamine. GMP synthase is a single chain enzyme that contains two distinct 
«'o^;^w?i foo""^'" N-temnlnal section [5]. - Glutamine^ependent carbamoyl-phosphate synthase 

A-ri^T^' (GD-CPSase); an enzyme involved in both arglnine and pyrimidine bkssynthesis and whrch catalyzes the 
ATP-dependent fomiation of carbamoyl phosphate from glutamine and carbon dtoxide. In bacteria GD-CPSase Is com- 
posed of two subunits: the large chain (gene carB) provides the CPSase activity, while the small chain (gene carA) 

^haI f<!l1° ^""^^'^ ^"^"^ ^^9'"'"® biosynthesis is also composed of two subunits- 

CPA (GATase). and CPA2 (CPSase). In most eukaryotes. the first three steps of pyrimidine biosynthesis are catalyzed 
by a large multifunctional enzyme (called URA2 in yeast, rudimentary in Drosophlla. and CAD in mammals) The GA- 
Tase domain is located at the N-terminal extremity of this poiyprotein [6] - 

Phosphoribosylformylglycinamidine synthase II (EC 6.3.5.3 ). an enzyme that catalyzes the fourth step in the de novo 
biosynthesis of purines. In some species of bacteria, FGAM synthase II Is composed of two subunits- a small chain 
(gene purQ) which provides the GATase activity and a large chain (gene purL) which provides the amtnator activity. - 
■me histidine amidotransferase hisH. an enzyme that catalyzes the fifth step in the biosynthesis of histidine in prokary- 
ofes.ln the second component of AS a cysteine has been shown [7] to be essentialfor the amidotransferase activit/. 
The sequence around this residue is well conserved in all the above GATase domains and can be used as a signature 
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pattern for class-l GATase - 

Ssitf re"s«:r f^'^^HLIVMFYT].fLIVMFY]-G-(LIVMFY]-C4UVMFYN]-G-x-IQEH]- x-ILIVMFA] [C is the 

[ 1] Buchanan J.M. Adv. Enzymol. 39:91-183(1973). 

[ 2] Weng M., Zaikin H. J. Bacteriol. 169:3023-3028(1987). 

[ 3] Nyunoya H., Lusty C.J. J. Biol. Chem. 259:9790-9798(1984). 

[ 4] Crawford I.P. Annu. Rev. Micrc*)iol. 43:567-600(1989). 

1 5] ^Ikin H., Argos P., Narayana S.V.L. Tiedeman A.A.. Smith J.M. J. Btol. Chem 260-3350-3354fiQa«5^ 
6 Davdson J.N.. Chen K.C.. Jamison R.S.. Musmanno LA.. Kem C.B. BioEssTys fs i^^S) 
[ 7] Tso J.Y.. Hemiodson M.A.. Zaikin H. J. Biol. Chem. 255:1451-1457(1980). 'o^HaaJ). 

[0649] 216. Glutamine amidotransferases class-ll active site (GATase_2) 

A large group of biosynthetic enzymes are able tocatalyze the removal of the ammonia group from glutamine and then 
to transfer 1h.s group to a substrate to form a new carbon-nitrogen group. This cata^ic act4rk?o,rarqriSrJ^ne 
amdotransferase (GATase) (EC 2.4.2.-) [1]. The GATase domain exists erther as a separaTpolTpeX's^St or 

J K . ^^"^ "^^^ *° '°"°wing enzymes: - Amido phosphoribosvltransferase 

(glutamine phosphonbosylpyrophosphate amidolransferase) (EC 2A2J4). An enzyme wWch caXS thS^?f n 
|n purine biosynthesis the tr^^^^^ 

Ltllf r f ■ G'"cosamine-fnjctose-6-phosphate aminotransferase (EC 2 6.1 ^ ThTiSe 

catalyzes a key reaction in amino sugar synthesis, the formation of glucosamine e-phosphat^f^^uctl^^ fi 

Sr,^!! 7'"7.^'"?> (EC aas^). This enzyme is responsible for the synthesis of asparagine from aspartate and 

bli^hnw "f 'TT ^'^^^'^"y °' "^'^^^ °' ^" these en^mes. "S^e cS^e has 

P^-P^-^-V'-sferase [4] and in asparagine synthetase (5] to be importanlforre catajtic 

[06501 Consensus pattern: <x(0.11)-C-[GS]-[IV]-[LIVMFYW]-[AG] [C is the active site residue]- 

[ 1] Buchanan J.M. Adv. Enzymol. 39:91-183(1973). 

[ 2] Weng M., Zaikin H. J. Bacteriol. 169:3023-3028(1987). 

[ 3] Nyunoya H., Lusty C.J. J. Biol. Chem. 259:9790-9798(1984). 

[ 4] van Heeke G.. Schuster M. J. Btol. Chem. 264:5503-5509(1989) 

[ 5] Vollmer S.J.. Switzer R.L.. Hermodson M.A., Bower S.G., Zaikin H. J. Biol. Chem. 258:10582-10585(1983). 
[0651] 217. GDP dissociation inhibitor (GDI) 

^2]^1] Schalk 1. Zeng K. Wu SK. Stura EA. Matteson J. Huang M. Tandon A, Wilson lA. Balch WE. Nature 1996; 
[0653] 218. Oxidoreductase family (GFOJDH_MocA) 

fSS JUtV^^ o' enzymes utilise NADP or~NAD. This family: is called the GFO/IDH/MOCA family in swiss-prot 
[0655] [1 1 Kingston RL, Scopes RK, Baker EN. Structure 1 996;4: 1 41 3-1 428 
[0656] 219. GHMP kinases putative ATP-binding domain 

Consensus pattem: [LIVM]-{PK]-x-lGSTA]-x(0.1 )-G-L-[GS]-S-S-[GSA]-[GSTAC]- 
[0657] [ 1 J Tsay YH.. Robinson G.W. Mol. Cell. Bral. 11 :620-631 (1 991 ). 
[0658] 220. Glucose inhibited divisbn protein A family signatures (GIDA) 

Bacterial glucose inhibited diviston protein A (gene gidA) is a protein of 70Kd whose function is not yet known and 
whose sequence is highly consen/ed. It is evolutionary related to yeast hypothetical protein YGL^Lr rpln^rhThHr 
e legans hypothetical protein F52H3.2 and a Bacillus subtilis proL calKS w^hls d^^S^^^^ 

[0659] Consensus pattem: [GS]-{PT]-x-Y-C-P-S-tLIVM]-E-x-K-(LIVMl-x-rKRl- 
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Glu / Leu / Phe / Val dehydrogenases active site 



or NADP-dependent reversible deamination of glutamate into alpha-ketoglutarate (1 .2]. GluDH isozw^i are oen- 
erally involved wrth either ammonia assimilation or glutamate catabolism ^ 

■ n^Zn,?'^'''°^T' '•^•'•'^ ^'-'"^''^ ^ NADKlependent enzyme that cata^zes the reversible deami- 
nation of leucine and several other aliphatic amino acids to their keto analogues [31 

- Phenyblanine dehydrogenase (EC 1 .4.1 .20) (PheDH) is a NAD^ependent enzyme that catalyzes the reversible 
deamidation of L-phenylalanine into phenylpymvate [4] '-"-ly^es me reversible 

- Valine dehydrogenase (EC 1.4.1.8) (\felDH) Is a NADP-dependent enzyme that catalyzes the reversible deami 
dation of L-valine into 3-methyl-2-oxobutanoate [5J. reversioie aeami- 

[0661] These dehydrogenases are structurally and f uncttonally related. A conserved lysine residue located in a olv 

irth T 'f" ""''"'"'^ '^"^"^^ conservation of tJe rej^on aroS^ esidte 
allows the derivation of a signature pattern for such type of enzymes 

E (K is the active site residue) Sequences 

known to belong to this class detected by the pattern ALL ^^cHuenutft, 

jSiur^^'L^rLer""" '''' '^^^ p'""'"" °' --pt- °' 

[ 1] Britton K.L.. Baker RJ., Rice D.W.. Stillman T.J. Eur. J. Biochem. 209:851-859(1992) 
[ 2] Benachenhou-Lahfa N,. Forterre R, Labedan B. J. Mol Evol 36 335-346(1993) 

[ 3] Nagata S., Tanizawa K., Esaki N.. Sakamoto Y, Ohshima T. Tanaka H.. Soda K. Biochemistiy 27:9056-9062 

[ 4] Takada H. Yoshimura T, Ohshima T, Esaki N., Soda K. J. Biochem. 109:371-376(1991) 
1 5J Hutchinson C.R., Tang L. J. Bacteriol. 175:4176-4185(1993). 

[0664] 222. GMC oxidoreductases signatures 

The following FAD flavoproteins oxidoreductases have been found f1 21 to be evolulionan/ rfita»ort Thoo» « 
wh.harecal.ed•GMCoxkioreductases^are.istedbelow-G,ucoseoLi;^'se(^^^^^ 

?^S^LZZTi£T ' " c'etta-gluconolactone . hydrogen perox-^ihanol oxidase^EC 313 

genase EC Ti%TSZT^lTl "^'^ n " ^''^^'"^^y^^ * ^V^'-^en peroxide. - Choline^ii^ 
genase (ECrijgj) (CHD) from bactena. Reaction catalyzed: choline + unknown acceptor -> betaine acetaldehvde 
. reduced acceptor. - Glucose dehydrogenase (GLD) (EC 1.1.99.10^ from Drosophila. Ruction tee ! 

unknown accentor -!> (HfiHa-niiirrinrtia^t«r,« . I ^. . . . . r ™aoiiuri oduiiyzea. giucose + 



unknownaccep.or->dena-g,uconolacto;e.reduced;;c;pl^^^^ 
bactenurn sterohcum and Streptomyces strain SA-COO. Reaction catalyzed: cholesterol /c^<ygi;riShoTeTt^rn 
tZ J ^ H°^'" " """^ f^^' ^" 'l^^^ydrogenase from Pseudomonas oleovorans which converts 

STpTowh'^ "'"f ^"^'y'^^- '^'""y a lyase: - (R)-mandr^ftrSZi 

romlSS^^ T ' '^^'"^ ''""^ P'^"'" f^^- ^ ^"^'"^ cyanogenis. the release of hydro^cyS 

ST P^^'^'"" °' 556 (CHD) to 664 (MOX) amiL acW resfdues 

which share a number of regions of sequence similarHies. One of these regions, located in the N-terr^Jial sSon 
corresponds to the FAD ADP-binding domain. The function of the other conserved domain 1^ not yetZl C^^ 

a^50 residues after the ADP-binding domain, while the second one is located in the central sect on ^ 
[0665] Consensus pattern: [GA]-[RKN]-x-[LIV]-G(2)-tGST](2)-x-ILIVIW]-N-x(3)-[FYWAl- x(2)-rPAGl-xf5WDNESHl 
Consensus pattem: [GSJ-{PSTA]-x(2)-[ST]-P-x-[LIVM](2)-x(2)-S-G-[LIVM]-G- ' ^ ' ' ' 

[ 1] Cavener D.R. J. Mol. Biol. 223:811-814(1992). 

[ 2] Henikoff S., Henikoff J.G. Genomes 19:97-107(1994) 

[ f^ nT ^^I'l" ^- " • ^ Witholt B. Mol. IVIicrobiol. 6:3121-3136(1992) 

1 4] Cheng I.R, Poulton J.E. Plant Cell Physiol. 34:1139-1143(1993). 

[0666] 223. (GIWIP_synt_C) 

Glutamine amidotransferases class-l active site 

^TrTnl ^'""P '° ^ '"^''^'^ '° '""^ ^ ""^ carbon-nitrogen group. This catalytic actfvityT as 

sir orrl°^T?r' ^^"^ f'^- ^^■^^^^ domain exists either as a sepaiate iJS^idlc 

subunit or as part of a larger poVpept.de fused in different ways to a synthase domain. On the basis oVsequence 
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similarities two classes of GATase domains have been identified [2,3]: class-i (also known as trpG-type) and class-ll 
(also known as purF-typo), Class-I GATase domains have been found in the following enzymes: 

- The second component of anthranilate synthase (AS) (EC 4. 1 .3.27) [4]. AS catalyzes the bbsynthesis of anthra- 
nilate from chorismate and glutamine. AS is generally a dimeric enzyme: the first component can synthesize an- 
thranilate using ammonia rather than glutamine, whereas component II provides the GATase activity. In some 
bacteria and in fungi the GATase component of AS is part of a multifunctional protein that also catalyzes other 
steps of the biosynthesis of tryptophan. 

- The second component of 4-amino-4-deoxychorismate (ADC) synthase (EC 4. 1 .3. -). a dimeric prokaryotic enzyme 
that function in the pathway that catalyzes the biosynthesis of para^aminobenzoate (PABA) from chorismate and 
glutamine. The second component (gene pabA) provides the GATase activity [4]. 

• CTP synthase (EC 6.34.2). CTP synthase catalyzes the final reaction in the biosynthesis of pyrimidine, the ATP- 
dependent fomnation of CTP from UTP and glutamine, CTP synthase is a single chain enzyme that contains two 
distinct domains; the GATase domain is in the C-terminal section [2]. 

- GMP synthase (glutamine-hydrolyzing) (EC 6.35.2). GMP synthase catalyzes the ATP-dependent formation of 
GMP from xanthosine 5'-phosphate and glutamine. GMP synthase Is a single chain enzyme that contains two 
distinct domains; the GATase domain is in the N-terminal section [5]. 

- Glutamine-dependent carbamoyl-phosphate synthase (EC 6.3.5.5) (GD-CPSase); an enzyme involved in both 
arginine and pyrimidine biosynthesis and which catalyzes the ATP-dependent formatran of carbamoyl phosphate 
from glutamine and carbon dioxide. In bacteria GD-CPSase is composed of two subunits: the large chain (gene 
carB) provides the CPSase activity, while the small chain (gene carA) provides the GATase activity. In yeast the 
enzyme involved in arginine biosynthesis is also composed of two subunits: CPA1 (GATase), and CPA2 (CPSase). 
In most eukaryotes. the first three steps of pyrimidine biosynthesis are catalyzed by a large multifunctional enzyme 
(called URA2 in yeast, rudimentary in Drosophila, and CAD in mammals). The GATase domain is located at the 
N-terminal extremity of this polyprotein [6]. 

- Phosphoribosylformylglycinamidine synthase II (EC 6.35.3). an enzyme that catalyzes the fourth step in the de 
novo biosynthesis of purines. In some species of bacteria, FGAM synthase II is composed of two subunits: a small 
chain (gene purQ) which provides the GATase activity and a large chain (gene purL) which provides the aminator 
activity. 

- The histidine am idotransf erase hisH. an enzyme that catalyzes the fifth step in the biosynthesis of histidine in 
prokaryotes. 

[0668] In the second component of AS a cysteine has been shown [7] to be essential for the am idotransf erase activity 
The sequence around this residue is well conserved in all the above GATase domains and can be used as a signature 
pattern for class-l GATase. 

[0669] Consensus pattem[PAS]-[LIVMFYT]-[LIVMFY]-G-[LIVMFY)-C-[LIVMFYN]-G-x-[QEH]- x-[LIVMFA] (C is the 
active site residue] Sequences known to belong to this class detected by the pattern ALL, except for 6 sequences. 
[0670] Note: in the first position of the pattern Pro is found in all cases except in the slime mold GD-CPSase where 
it is replaced by Ala. 

[ 1] Buchanan J.M. Adv. Enzymol. 39:91-183(1973). 

[ 2] Weng M.. Zaikin H. J. Bacteriol. 169:3023-3028(1987). 

[ 3] Nyunoya H., Lusty C.J. J. Biol. Chem. 259:9790-9798(1984). 

[ 4] Crawford LP. Annu. Rev Microbiol. 43:567-600(1989). 

[ 5] Zaikin H., Argos P, Narayana S.V.L., Tiedeman A.A., Smith J.M. J. Bral. Chem. 260:3350-3354(1985). 
[ 6] Davidson J.N., Chen K.C.. Jamison R.S., Musmanno LA., Kern C.B. BioEssays 15:157-164(1993). 
[ 7] Tso J.Y.. Hermodson M.A., Zaikin H. J. Biol. Chem. 255:1451-1457(1980). 

[0671] 224. Glutathione peroxidases signatures (GSHPx) 

Glutathione peroxidase (EC 1.11.1.9) (GSHPx) [1.2] is an enzyme that catalyzes the reduction of hydroxyperoxides 
by glutathione. Its main function is to protect against the damaging effect of endogenously formed hydroxyperoxkies. 
In higher vertebrates at least four forms of GSHPx are known to exist: a ubiquitous cytosolic form (GSHPx-1), a gas- 
trointestinal cytosolic for (GSHPx-GI) [3], a plasma secreted form (GSHPx-P) [4], and a epkiidymal secretory form 
(GSHPx-EP). In addltbn to these characterized fomis, the sequence of a protein of unknown function [5] has been 
shown to be evolutranary related to those of GSHPx^s. In filarial nematode parasites such as Brugia pahangi the mapr 
soluble cuticular protein, known as gp29, is a secreted GSHPx which could provide a mechanism of resistance to the 
immune reaction of the mammalian host by neutralizing the products of the oxidative burst of leukocytes [6].Escherichia 
coli protein btuE, a periplasmic protein involved in the transport of vitamin B1 2. is also evolutbnary related to GSHPx-s; 
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S« JrSp TK '^'^^'°""'''P "°' '^'^^ Selenium, in the fonr, of selenocysteine [7J is part of the catalytic 
^'^.'^''"r^ ^^'^ selenocysteine residue is moderately weli consei^ld in GSHfVsTnd me 

related proteins and can be used as a signature pattem. As a second signature for this family of prcJeiJ a hthlv 
conserved octapeptide located in the central section of these proteins was selected ^ ^ ^ 

Consensus pattern: (UV]-JAGD]-F-P-[CS]-[NG]-Q- 

[ 1] Mannervik B. Meth. Enzymol. 113:490-495(1985) 

[ 2] Mullenbach G.T Tabriz! A.. In/ine B.D.. Bell G.I.. Tainer J.A.. Hallewell R.A. Protein Eng 2-239-246(1988) 
[ 3] Chu F.F., Doroshow J.H.. Esworthy R.S. J. Biol. Chem. 268:2571-2576(1993) ^46(1988). 
[ 4^ Takahashi K.. Akasaka M.. Yamamoto Y.. Kobayashi C. MIzoguchi J.. Koyama J. J. Biochem. 108:145-148 

[ 5] Dunn D.K Howells D.D.. Richardson J., Goldfarb PS. Nucleic Acids Res. 17:6390-6390(1989) 
6 Cookson E.. Blaxter IVI.L.. Selkirk M.E. Proc. Natl. Acad. Sci. U.S.A. 89:5837-5841 (1992) 
1 7] Stadtman T.C. Annu. Rev. Biochem. 59:111-127(1990). 

[0673J 225. (GST) 
Glutathione S-transf erases 

^°Vil-r conjugatbn of reduced glutathione to a variety of targets. Also included in the alianment but are 
Z f Similarity to GST was previous^, not'ed. Eukaryotic e^ iJ-STs l^m^ 

Not known to have GST activity; similarity not previously recognized. Supported by HMM aS ^nua°atonment^n' 

pote.ns.nE.colL Not known to have GST activity. Similarity not previously recognized. Supported byHll 
alignment inspection. Alignment spans entire protein. leaoynMivi ana manual 

[0675] 226. GTP1/OBG family signature 

I'l'iS^nTv '^"'"^ °' GTP-binding proteins has been recently characterized [1.2]. This family currently includes- - 
Mouse and Xenopus protein DRG. - Human protein DRG2. - Drosophila protein 128up. - Fission yeast pro ein trtpl' - 

S^pohet^Jp^^^^^ 

YSi1^?um r Y«Lfh„^^^^^ l'."'^""'" ^yP^'^^*'^^' P^°t«i" '^G384. - Yeast hypothetical protein 

f,«^r«S n th .! ^'yPO^hetical protein YGR173w. - Caenorhabditis elegans hypothetical pratein C02F5 3 The 

lon^^n th f ^ T '° '^"y "^^V polypeptides of aboEt 40 to ^ Kd whVch 

contain the five small sequence elements characteristic of GTP-bindinq proteins 131 As a slon«t.,™ rlilom ,h 
that correspond to the ATP/GTP B mot« (also called G-3 inGTP-bindrng'prS) 1^^^^^^^^^^^ 
[0676] Consensus pattem: D-[LIVM]-P-G-[LIVIVI](2)-[DEY]-[GN]-A-x(2)-G-x-G - 

Ssl^",!^ Vr7°v°°'^o;"l^"'^^- ^ - ^'°P^y«- R^s. Commun. 189:363-370(1992) 

[ 2] Hudson J.D.. Young PG. Gene 125:191-193(1993). -^/uwaa*:;. 

[ 3] Bourne H.R., Sanders D.A., McCormick F. Nature 349:117-127(1991). 

[0677] 227. (GTP_EFTU1) 
ATP/GTP-binding site motif A (P-loop) 

[0678] From sequence comparisons and crystallographic data analysis it has been shown [1 2 3 4 5 61 that an at, 

ln6 TZrZ I T " " ^"^^ 'yP'^^'-y '"^'^ ^ fl^'^""^ '"^P between a bet^-Sd 

and an alpha-hel«. This loop interacts with one of the phosphate groups of the nucleotide. This sequence mottf te 

orot«;^ ^ r "^^ "'^"^"^^ J ■''■'°°P' PI numerour/^^P-^r GTP-b"nIg 

pCnce ofTu'c? 1^^^^^^^^^^ " 'Th ""t'o ^ °^ rele^ce ^f Te 

?h!?nr ^ u u '^^^'^ " '^^'^ ^'P*^^ "^'^ s^bunits (see <PDOC00137» - Myosin heavy 

seT<Pc^sr362Tr 

^e <fDOC2p362>). - Guanylate kinase (see <PDOC00670>). - Thymidine kinase (see <PDOC00524 » ThvrnI 
T^Sr<%^>^F^:i.^'T'' <PDO00°868». - NrtrogLse-i;^;^^,;?^ 

<PDO?mii^^^PjpM.T; '""'^^ "^'P^" (ABC transporters) [7] (see 

2^?r^ • " °'^A,^;r'^'^ ■ ®TP-binding elongation factors (EF-Tu. EF-1aJ,ha EF-G EF 

Rl^fr / ' -^P-f °=y3*'°" 'actors family (see <PDOC00781» - Bacterial dnaA protein (see <PDOC0077l » 
■ Bactenal recA protein (see <PDOC00131 ». - Bacterial recF protein (see <PDOC00^9 >v - GuanT^Sdj: 
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<PDOC00388 >). - Bactenal type II secretion system protein E (see <PDOC00567>).Not all ATP- or GTP-bindina oro- 
teins are P«;ked-up by this mot^. A number of proteins escape detection because the structure o their A^^^^bin^^^ 
srte ,s completely drfferent from that of the p.|oop. Examples of such proteins are the El -E2 ATPases o the 
kinases. In other ATP- or GTP-binding proteins the flexible loop exists in a slighth^ d^erent fonn thl i^^^^^^^^ 

from the P-loop pattem: in the last position Gly is found instead of Ser or Thr. aeviaiion 
- Consensus pattem: [AG]-x(4)-G-K-[ST]- 

[ 1] Walker J. E..Saraste M., Runswick MJ.. Gay N.J. EM80J. 1 945-951(1982) 
[ 2] Moller W., Amons R. FEBS Lett. 186:1-7(1985). 

[ 3] Fry D.C. Kuby S.A., Mildvan A.S. Proc. Natl. Acad. Sci. U.S.A. 83-907-91 1(1 986) 
[4] DeverTE., Glynias M.J.. Merrick W.C. Proc. Natl. Acad. Sci. U.S.A 84" 181 4-1 81 8(1 987) 
[ 5] Saraste M.. Sibbald PR., Wittinghofer A. Trends Biochem. Sci 15-430^34(1990) 
[ 6] Koonin E.V J. IVIol. Biol. 229:1165-1174(1993) 

L?iSl990)''' ^'""'^ ^^""9her M.P J. Bioenerg. Biomembr. 22: 

[ 8] Hodgman TC. Nature 333:22-23(1988) and Nature 333:578-578(1988) (Errata) 

[ 9] L^der R. Lasko R. Ashburner M.. Leroy R, Nielsen RJ.. Nishi K.. Schnier J.. Slonimski RR Nature 337:121 -122 
[10] Gorbalenya A.E.. Koonin E.V. Donchenko A.P. Blinov V.M. Nucleic Acids Res. 17:4713-4730(1989). 
[0679] GTP-binding elongation factors signature (GTP_EFTU2) 

otes and eukaryotes, there are three distir,ct types of elor,gation factors, as described in the following ^SII: 

" Eukaryotes Prokaryotes Function 

^. , , EF-1 alpha EF-Tu Binds GTP and an amInoacvl-tRNA- haMu 

the IT ° nx^ci ^'■^"^•^ ^'-^^ '"'^^^'^'^ EF-1a/EF.Tu to displace GD^a^df^^^^^^^ 

tolhe P T °" • ''"^ ^"'^ peptidyl-tRNA and translocates the latter from the A srte 

ir^cludes the following proteins: - E7k^;"otrpe'idT;harnT^^^^^^^^ 

.nteract wrth release factors that bind to ribosomes that have encountered a stop codon at their decSling srte and help 
3 ifa n human homolog as GST1 -Hs. - Prokaryotic peptide chain release factor 3 (RF-3) (gene pr^C) RF 

ZKl236?t ^L^HBS t\ '"^^""^ ^^'"^ ''''''^ ^'^ Caenorhabditis elegans 

Pr«L: ; : ^ ' ^* ®^ '^^^ ^ P'°'®'" °^ '""^tion which is highly similar to EF-laloha - 

1.^^ T '^y^ZT'''''''''' f^'' "^'^ '^'"^ «° ^^Place EF Tu for the insertion of se- 

enocyste-ne directed by the UGA codon. - The tetracycline resistance proteins tetVMetO [8.9J from various tec erL 

inhfbton. e^^^^^^ "T"" "''""^ "'""'"9 °' «'"i"°acyl-tRNAs. These proteins atSlEhe 

pothetical protein yihK [11].ln EF-1-alpha. a specific regbn has been shown [12] to be involved In a conformational 
change mediated by the hydrolysis of GTP to GDP This regfon Is consented In both EF-talpha/EF^TVaTlTas EF 

developed for this family of proteins Include that consen/ed region 

g»6M] Consensus pattem: D-[KRSTGANQFYW]-x(3)-E-[KRAQ]-x-[RKQD)-[GC]-(IVMK]-{ST]- IIV]-x(2)-IGSTACK- 

! 1! Encyclopedia Biochemistry, Second Edition, Walter de Gruyter. Berlin New-York (1988) 

1 2] Moldave K. Annu. Rev Biochem. 54: 1 1 09-1 1 49(1 985) 

cTSSVirr °«9'<«^"^«kay« A.R.. PoznyakovsW A.I.. Paushkin S.V.. NIerras 

C.R., Cox B.S., Ter-Avanesyan M.D.. Tuite M.F. EMBO J. 14:4365-4373(1995) 

[ 4jGrentzmann G.. Brechemler-Baey P.. Heurgue-Hamarel V.. Buckingham R.H. J. Biol. Chem. 270-in5Q...infinn 

( 5] Nelson R.J Zlegelhoffer T, Nicolet C. Wemer-Washbume M.. Craig E.A. Cell 71:97-105fl9gp^ 

[ 6] Ann O.K.. Moutsatsos I.K.. Nakamura T. Un H.H., Mao R-L, Lee M.-J.. Chi n S.. UemRK.H.. Wa ng E. J. Btol. 
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Chem. 266:10429-10437(1991). 

[ 7] Forchammer K.. Leinfeldr W.. Bock A. Nature 342 453-456(1989) 
[ 8] Manavathu E.K.. Hiratsuka K,. Taylor D,E. Gene 62 17-26(1988) 

[ 9] Leblanc D.J.. Lee LN. Trtmas B.M.. Smith CJ.. Tenover F.C. J. Bacterid 170-3618-3626nqflfl^ 

[12] Moller W., Schipper A., Amons R. Biochimie 69:983-989(1987). •"38(1993). 
[0681] 228. GTP cyclohydrolase II. 

rX?'m bT^^' T^J^^ '^^ "'°P biosynthesis of riboflavin 

!2SL^^ ^^'^=J°s«-1-P'iospHate uridyl transferase signatures (GalP UDP transf) 

S"'?o™' -(3).G.x(4)-H-P.H-x-Q (The .wo H's are the ac.le srte residues]- 

[0684] Consensus pattern: D-L-P-|.V-G-G.tST|-[LIVM](2)-rSAl-H-rDENl-H-rFYl-0-G R Nnto^Lo. . 
structural^ related to the HIT family of proteins (see <PDOC00694 ^ 

[ 1] Reichardt J.K.V.. Berg P. Nucleic Acids Res. 16:9017-9026(1988) 
[ 2] Mollet B.. Pilloud N. J. Bacteriol. 173:4464-4473(1991). 

[0685] 230. Gamma-thionins family signature 

[0686] The following small plant proteins are evolutionary related: 

- A flower-specific thionin (FST) from tobacco [2] 

' Inhibitors of insect alpha-amylases from sorghum [4]. 

- Probable protease inhibitor P322 from potato. 

- A gennination-related protein from cowpea [5] 

Soybean sulfur-rich protein SE60 [7]. 

- Vicia faba antibacterial peptides fabatin-1 and -2. 



^lljlj 

xxCxxxxxxxxxxCxxxxxCxxxCxxxxxxxxxCxxxxxxCxCxxxC •..i.*., 

^ ' I 

C: conserved cysteine involved in a disulfide bond, 
position of the pattern. 



'C: 



[0688] consensus pattern: {KRG]-x-C-x(3)-ISV]-x(2)-[FYWH]-x-tGF]-x-C-x(5)-C-x(3)-C [The four C's are involved in 
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disulfide bonds]- 



[1] Bruix M.. Jimenez M.A., Santoro J., Gonzalez C. Colilla F.J., Mendez E.. Rico M. Biochemistrv 32-715-724 
(1993). 

[2] Gu Q.. Kawata E.E.. Morse M.-J.. Wu H,-M.. Cheung A.Y. Mol. Gen. Genet. 234:89-96(1992). 

[3] Terras F.R.G.. Torrekens S., van Leuven R, Cteborn R.W.. Nfenderleyden J., Cammue B PA . Broekaert WR 

FEBS Lett. 316:233-240(1993). 

[4] Bloch C. Jr. Richardson M. FEBS Lett. 279:101-104(1991). 

[5] Ishibashi N.. Yamauchi D., MIniamikawa T. Plant Mol. Biol, 15:59-64(1990). 

[7] Choi Y, Choi YD., Lee J.S. Plant PhysbL 101:699-700(1993). 

[0689] 231 . Gelsolin. Gelsolin repeat. Number of members: 170 

[0690] [1 ]Medline: 97433077. The crystal structure of plasma gelsolin: implications for actin severing, capping, and 
nucleation. Burtnick LD, Koepf EK, Grimes J, Jones EY Stuart Dl. McLaughlin PJ, Robinson RC; Cell 1 997;90:661 ^70 
[0691] 232. Germin family signature 

Germins [1] are a family of homopentameric cereal glycoproteins expressed during germination which may play a role 
in altering the properties of cell walls during germinative growth. It has been shown that wheat and barleygermins act 
as oxalate oxidases (EC 1.2.34). an enzyme that catalyzes the oxidative degradation of oxalate to carbonate and 
hydrogen peroxide. Germins are highly similar to: - Germin-like proteins from various plants such as rape, violet or 
white mustard. - Slime mold spherulins la and lb which are proteins that accumulate specifically during spherulation. 
a process induced by various forms of environmental stress which leads to encystment and dormancy As a signature 
pattem the best consented region was selected: a decapeptide located in the central section of these proteins. 
[0692] Consensus pattem: G-x(4)-H-x-H-P-x-A-x-E-[LIVM]- 
[0693] [1 ] Lane 8.G. FASEB J. 8:294-301 (1 994). 
[0694] 233. (GlutR) 
Glutamyl-tRNA reductase signature 

[0695] Delta-aminolevulinic acid (ALA) is the obligatory precursor for the synthesis of all tetrapyrroles including por- 
phyrin derivatives such as chlorophyll and heme. ALA can be synthesized via two different pathways: the Shemin (or 
C4) pathway which involves the single step condensation of succinyl-CoA and glycine and which is catalyzed by ALA 
synthase (EC 2.3. 1 .37) and via the C5pathway from the five^arbon skeleton of glutamate. The C5 pathway operates 
in the chloroplast of plants and algae, in cyanobacteria, in some eubacteria and in archaebacteria. 
[0696] The initial step in the C5 pathway is carried out by glutamyl-tRNA reductase (GluTR) [1 ] which catalyzes the 
NADP-dependent conversion of glutamate- tRNA(Glu) to glutamate-1-semialdehyde (GSA) with the concomitant re- 
lease of tRNA(Glu) which can then be recharged with glutamate by glutamyl-tRNA synthetase. 
[0697] GluTR is a protein of about 50 Kd (467 to 550 residues) which contains a few conserved region. The best 
conserved region is located in positions 99 to 122 in the sequence of known GluTR. This region seems important for 
the activity of the enzyme. We have developed a signature pattern from that consen/ed region. 
[0698] Consensus pattemH.[LIVM].x(2)-[LIVMHGSTAC](3)-[UVM]-[DEQ]-S-[LIVMA]-[LIVM](2)-[GF]-E-x-[EQR]. 
[I V]-[LIT]-[STAG]-Q-[LIVM]-(KR] Sequences known to belong to this class detected by the pattem ALL. 
[0699] [1] Jahn D.. Verkamp E., Soell D. Trends Biochem. Sci. 17:215-218(1992). 
[0700] 234. (Glycoprotease) 
Gtycoprotease family signature (aka Peptidase_M22) 

[0701] Glycoprotease (GCP) (EC 34.24.57) [1], oro-syaloglycoprotein endopeptidase. isa metalloprotease secreted 
by Pasteurella haemolytica which specifically cleaves O-sialoglycoproteins such as glycophorin A. The sequence of 
GCP is highly similar to the following uncharacterized proteins: 



Escherichia coli hypothetical protein ygjD (ORF-X). 
Bacillus subtilis hypothetical protein ydiE. 
Mycobacterium leprae hypothetical protein U229E. 
Mycobacterium tuberculosis hypothetical protein MtCY78.10. 
Synechocystis strain PCC 6803 hypothetical protein slrO807. 
Methanococcus jannaschii hypothetical protein MJ1130. 
Haloarcula marismortui hypothetical protein in HSH 3' region. 
Yeast hypothetical protein YKR038c. 
Yeast hypothetical protein QRI7. 



[0702] One of the consented regic^s contains two conserved histidines. It is possible that this region is involved in 
coordinating a metal ion such as zinc. 
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[0703] Consensus pattem[KR]-{GSAT|-x(4)-(FYWLH]-[DQNGK]-x-P-x-ILIVMFY]-x(3)-H-x(2)-[AG]-H-[LIVMI Se- 
quences known to belong to this class detected by the pattern ALL. 
[0704] Note: these proteins belong to family M22 in the classification of peptidases (2,E1 ]. 

[ 1] Abdullah K.M.. Lo R.Y.C.. Mellors A. d. Bacteriol. 173:5597-5603(1991). 
( 2] Rawlings N.D., Barrett A. J. Meth. Enzymol. 248:183-228(1995). 

[070S] 235. (Glucosamine_iso) 

Glucosamine/galactosamine-e-phosphate isomerases signature 

Glucosamine-6-phosphate isomerase (EC 5.31. 10) (or Glc-6-P deaminase) is the enzyme responsible for the conver- 
sion of glucosamine 6-phosphate into fructoses phosphate [1]. It Is the last specific step in the pathway for N.acetylglu- 
cosamine (GlcNAC) utilization In bacteria such as Escherichia coil (gene nagB) or in fungi such as Candida albicans 
(gene NAG1).GIc-6-P isomerase is evolutionary related to: - A putative Escherichia coll galactosamine-6-phosphate 
isomerase (gene agal) [2]. - Escherichia coll hypothetical protein yieK. - Bacillus subtilis hypothetical protein ybfT. As 
a signature pattern a consented region located In the central part of these enzymes was selected. This region contains 
a conseived histidine which has been shown [1], in nagB, to be important for the pyranose ring^pening step of the 
catalytic mechanism a 

[0706] Consensus pattern: [LIVM]-x(3)-G-x-[LIT]-x-[LIV]-x-[LIVM]-x-G-[LIVM]-G-x- [DENJ-G-H- 

[ 1] Oliva G., Pontes M.R.M.. Gan-att R.C.. Altamirano M.M., Calcagno M.L, Horjales E. Structure 3:1323-1332 
(1 9Q5). 

[2]Reizer J.. RamseierT.IVI.. Reizer A., Charbit A.. Saier M.H. Jr. Microbiology 142:231-250(1996). 
[0707] 236. Pneumovirus attachment glycoprotein G (glycoprotein G) 

[0708] This family Includes attachment proteins from respiratory synctial virus. Glycoprotein G has not been shown 
to have any neuraminidase or hemagglutinin activity (Swiss-Prot). The amino terminus is thought to be cytoplasmic 
f!!^Jl^ '^'""'""^ extracellular. The extracellular region contains four completely consen/ed cysteine residues' 

[0709] [1 ] Johnson PR, Spriggs MK. Olmsted RA. Collins PL, Proc Natl Acad Sci U S A 1 987;84-5625-5629 
[0710] 237. Glycosyl transferases group 1 

[0711] Mutations in this domain of Swiss :P37287 lead to disease (Paroxysmal Nocturnal haemoglobinuria) Members 
of this family transfer activated sugars to a variety of substrates, including glycogen Fructose-6-phosphate and iipopol- 
ysaccharides. Members of this family transfer UDP. ADP GDP or CMP linked sugars. The eukaryotic glycogen syn- 
thases may be distant members of this family »/ a / 
[0712] 238. Glycosyl transferases (Glycos_transf_2) 

[071 3] Diverse family, transferring sugar from UDP-glucose, UDP-N-acetyl-galactosamine, GDP-mannose or CDP- 
abequose, to a range of substrates including cellulose, dolichol phosphate and teichoic acids 
[0714] 239. (Glucos_transf_3) 

Thymkline and pyrimidine-nucleoside phosphorylases signature 

[0715] Thymidine phosphoiylase (EC 2.4.2.4) catalyzes the reversible phosphorolysis of thymWine. deoxyuridine 
and their analogues to their respective bases and 2-deoxyribose 1 -phosphate. This enzyme regulates the availability 
of thymidine and is therefore essential to nucleic acid metabolism. 

[071 61 In Escherichia coll (gene deoA). the enzyme is a dimer of identical subunits of about 48 Kd [1 ] In humans It 
was first Identified as platelet-derived endothelial cell growth factor (PD-ECGF) [El J before being recognized 121 as 
thymidine phosphorylase. 

[0717] Bacterial pyrimidine-nucleoside phosphorylase (EC 2.4.2.2) (gene pdp) [3] is an enzyme evolutionary and 
structurally related to thymidine phosphorylase. 

[0718] A a well conserved region of 1 9 residues located in the N-temiinal part of these proteins signature pattern for 
these enzymes was selected. 

[0719] Consensus patternS-[GS].R-[GA]-[LIV].x(2HTAHGA].G-T-x-D.x-[LIV]-E Sequences known to belong to this 
class detected by the pattern ALL, 



[1] Walter M.R. Cook WJ . Cole LB.. Short SA,Kos2alkaG.W..Krenits^ j Biol Chem 265- 

14016-14022(1990). . w cm .luo. 

(fgSr*^^ ^ Yoshimura A.. Sumizawa T, Haraguchi M., Akiyama S.-L. Fukui K., Yarnada Y. Nature 356:668^ 
[ 3] Saxild H.H., Andersen LN., Hammer K. J. Bacterbl. 178:424-434(1996). 
[0720] 240. GlycosJransf_4. Glycosyl transferase. Number ot members: 44. 
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[0721] [1 ] M^JIine: 95252686. A family of UDP-GlcNAc/MurNAc: polyisoprenol-P GlcNAc/MurNAc-1 -P transferases 
Lehrman MA; Glycobiology 1994;4:768-771. «wiviun>i«c i riransierases. 

[0722] 241 . Glycosyl hydrolases family 1 5. 21 members. 
[0723] 242. Glycosyl hydrolases family 1 6 signature 

It has been shown [1] that the following glycosyl hydrolases can be classified into a single family on the basis of 
sequence similanties: - Bacterial beta-1.3-l,4-glucanases. or lichenases. (EC 3.2.1.73) mJnIy Uo^^cZ bSt al^ 
dZ^S If T f suocinogenes and Rh^S^ius marinus (g7n bgIA) 

cllus ci culans beta-1,3-glucanase A1 (EC 32.1.39 ) (gene gIcA). - Lamarinase (EC 32.1.6) from ClStridiurnVherrno- 
cellum (gene ^m1). - Streptomyces coelicolor agarase (EC 32^81) (gene dagArT^eron^as ^ JZ^^^^ 
b1rr?T''? '^'^^■^'^ '"^'^^ ''^"''"'^ conserved glitamates have beTn Z^^l o 

asrstna^^repLtr^^'''''"^ 

du?s? P^"^'"^ E-['-IV]-D-[LIV]-x(0.1)-E-x(2)-(GQ]-[KRNF]-x-(PSTA] [The two E's are actKre site resi- 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] Juncosa M.. Pons J., Dot T., Querol E., Planas A. J. Biol. Chem. 269:14530-14535(1994). 
[0725] 243. Glycosyl hydrolases family 1 7 signature 

It has been shown [1.2] that the following glycosyl hydrolases can be classified into a single family on the basis of 
sequence s.m.lar.ties:-Glucanendo-1.3-beta-glucosidases(ECM1J?)(endo-(1->3)-beta^g 
cSlt«,^.r"T' r ".r""' '^'""'^ °' P'""'^ ^^alnst pathogens\hroigh rts abilfty to Sgrade fun^l 
BGl^f Tht ■ l-^-^^^^-S'^^^'^^^e (EC 32J^) (exo-(1->3)-beta glucanasT) from yeast geS 

rlS' H ^"'^^^'"^y P'^y ^ '^^'l «^Pansion during growth, in celKell fusion during mating, and in spore 
b^llnZL'T^'"'"..' (EC 3JiZ3) (endo-(1->3,1->4)-beta-glucanase) from various plants The 

.^L°" h7 , 1' '"''"^""^ °' '^^"^ ""^-^"^ ^'^'^ ^^'^ =«=«^n- This region contains a 

consented tryptophan residue which could be involved in the interaction with the glucan substrates [2] andlai^ 

?orrs3aTa^t~^^^ 

Consensus pattern: ILIVM]-x-[LIVMFYWAJ(3)-[STAG]-E-[STA]-G.W-P-[STN]-x-[SAGQ] [E is an active site residue]- 
[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

1 2] Ori N., Sessa G., Lotan T, Himmelhoch S., Fluhr R. EMBO J 9-3429-3436(1 990) 

27iS(T994'J;' ' """^ • ^'"'^'"^ Sci. U.S.A. 91: 

[0726] 244. Glyoxalase I signatures 

Glyoxalase I (EC 4A15) (lactoylglutathione lyase) catalyzes the first step of the glyoxal pathway the transformation 
Of methylglyoxal and glutathioneinto S-lactoylglutathione which is then converted by glyoxaS' tVlt c add m 

monomen-^ 17".;"'""°"' r"'"'' '"'^ "^'^ °' ^'""^ P^^ ^"^^ ^-'«' ^ yeas en^S irt 

miri .h homodimeric. The sequence of glyoxabse I well consen/ed. In baderia and 

tZn^misZZ'rrr;'''^^ 

the enzyme ,s bu.lt out of the tandem repeat of an homologous domain. Two signature patterns for this family were 

protein and contains a consen/ed histidine that could be implicated in the binding of the zinc atom 
[0727] Consensus pattem: (HQ]-(IVT]-x-[LIVFY]-x-(IV]-x(5)-[STA]-x(2)-F-[YM]-x(2.3)-[LMF]-G-rLMFl- 

10728] [ 1] Kim N.-S., Umezawa Y. Ohmura S.. Kato S. J. Biol. Chem. 268:11217-11221(1993) 
[0729] 245. (Glypican) 
Glypicans signature 

^K^'"T.'i!".^' ^'^.^ '^"y °' ''^P^'^" proteoglycans which are anchored to cell membranes by a alvcosvl- 
phosphatidylinositol (GPI) linkage. Strudural^. these proteins consist of three separate domlSIs: ^ 

a) A signal sequence; 

bonds and which also contains the sites of attachment of the heparan sulfate glycosaminogjcan side chains- 
c) A C-terminal hydrophobic region which is post-translationally removed after fomiation oHhe GPI-^cho 
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[0730] The proteins known to belong to this family are: 

- Glypican 1 (GPCI). 

- Glypican 2 (GPC2) or cerebroglycan 

■ oS^i^n^x^'sc^r ■ ' °' ' ^'-^ 

K-glypican. 

- Glypican 5 (GPC5). 
Drosophila protein dally 

[0732] Consensus PatternC-x(2)-C-x-G-[LIVMJ-x(4)-P-C-x(2)-rFYl-C-xf2)-rLIVMl r r rrh. 

[ 1] Weksberg R., Squire J.A., Templeton D.M. Nat. Genet. 12:225-227(1996) 
[ 2] Watanabe K.. Yamada H.. Yamaguchi Y J. Cell Biol. 130:1207-1218(1995). 

[0733] 246. Granins signatures 

^^^eZl^7:^Z::Z^^ °' P--^ - --to,, granules o, a 

seem to be the pre3s ot S^iracS^e olS, Z .h °' t^^V 

peptide hormones and neuropeSr^^S^ r^^^^^^^ 7 '"'P^^ '"^^ P^<=ka9ing of 

Chromogranin A (CGA) [21 ?GA fa prateh of abouTJl ^^^"^""^^ ' 

Which stLgV inhibits luie^nd ce^^^^^^^^^^ TZ';:! P^P'^ statin 

fated protein of about 600 residues - S^rltc^^^o T^ " S®'=-*°9f«"'" 1 (chromogranin B). A sul- 

Apa. from their subcellular^rt^n^nd^rrnce^^^^^^ 

many structural similarities. Only one short region located in^^r t^^inrr ^' "'"'^'"^ ^"^^^ 

[0734] Consensus pattern: [DEl-(SN]-L-[SAN]-x(2)-(DEl-x-E-L- 

Consensus pattern: C-[LIVM](2)-E-[LIVM](2)-S-[DN]-rSTAl-L-x-K-x-S ri ivmi rc-riM t ^ r^. 

by a disulfide bond]- ^ ■ flj l x K x-S-x(3)- [LIVM]-[STA]-x-E-C [The two C's are linked 

!i^Sl"r!o"n'.T ;'^°'''n'^ "'' ' ^'""'^ Sci. 16:27-30(1991). 

I <ij Simon J.-R, Aunis D. Biochem. J. 262:1-13(1989). 

[0735] 247. grpE protein signature 

22 to 26 Kd In yeaa. evolu«o™r,,,ld ^^^^ o recycte m«s eirici«„y GrpE Is a prolan d abM 

[ 1] Georgopoulos C, Welch W. Annu. Rev. Cell Biol. 9 601-635(1993) 

[0737] 248. Guanylate kinase signature and profile 

and in vertebrates, GK is a highUservedCS" Sin o^^^^^^^^ 

to be structurally similar to the foltowing proteins - Protein A57R (oT&^^JbuZ °" '^""^ '^■^"''J 
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[ 1] Stehle T, Schuiz G.E. J. Mol. Biol. 224:1127-1141(1992) 
[ 2] Bryant P. J. , Woods D. R Cell 68:621 -622( 1 QQP) 
[ 3] Goebl M.G. Trends Biochem. Sci. 17:99-99(1992) 

[ 2 f^^^^-^-^ ^^^"'^^ ^^'^^^^ J- Biochem. 213:263-269(1993) 

[ 5] Woods D.F.. Bryant P.J. Mech. Dev. 44:85-89(1994). 

[0739] 249. (Glyco_hydro_35) 

Glycosyl hydrolases family 35 putative active site 

fATiin »/ . ^ ^' ^ oeiong to family 35 in the classrfication of glycosyl hydrolases [3 F1 1 

1099-1 107(1995). ' ' ^ ^- Seymour G.B. Plant Physiol. 108. 

[ 3] Henrissat B., Bairoch A. Biochem. J. 293:781-788(1993) 

7oLTo5lS.'''"''"^ '•' "^^'"^^ S-^'- -J S A. 92: 

[0744] 250. (Glyco_hydro_16) 
Glycosyl hydrolases family 16 signature 

- Bacillus circulans beta-1 .3-glucanase A1 (EC 3.2.1.39) (gene gIcA) 

- Umarinase (EC 3.2. 1 .6) from Clostridium themiocellum (gene lami ) 

- Streptomyces coelicobr agarase (EC 3.2. 1 .81 ) (gene dagA) 

- Alteromonas carrageenovora kappa<arrageenase (EC 3.2. 1 .83) (gene cgkA). 

(W47, C«"»e-"=P».n1E^UV,^^u^^.x(a,,,.E..,^HGQHKRNF^x.|PSTA|WZSl 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991) 

[ 2] Juncosa M., Pons J.. Dot T. Querol E.. Planas A. J. Bbl. Chem. 269:14530-14535(1 994). 
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[0748] 251 . (Glyco_hydro_1 7) 
Glycosyl hydrolases family 17 signature 
(aka glycosyLhydrc4) 

S Jl^rcri^^^^^^^^ '^'^-^-^ '"'^ - '-i, on the 

chariSes °' "^^""^h its ability to degrade fungal cell waif i^T- 

- Glucan 1.3-beta-glucosidase (EC 3.2.1.58) (exo-(1 .>3)-beta-glucanase) from yeast (gene BGL2) This enzyme 

- Lichenases (EC 3.2.1.73) (endo-(1->3.i ->4)-beta-glucanase) from various plants. 

innSL ^°nsen/ed region in the sequence of these enzymes is located in their central section. This region 

con^.ns a conserved tryptophan residue which could be involved in the interaction with the glucan substrates [2] and 

This region was used as a signature pattern. ^ Miwt,ndni5,m. 
[0751] Consensuspattern[LIVM]-x-[LIVMFYWA](3).[STAG]-E.[STA]-G.W.P-[^^^ 

idue] Sequences known to belong to this class detected by the pattern ALL ^Jicisanaciivesrteres 
[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] Ori N., Sessa G.. Lotan T, HImmelhoch S.. Fluhr R. EMBO J. 9-3429-3436(1990) 

178^2^^^^^^^^^^^ ^^""""^ A^^^- S<^'- U S.A. 91: 

[0752] 252. (GIyco_hydro_3) 
Glycosyl hydrolases family 3 active site 

Si j;nta's;^"letZy'- °" '-^'^ °' ^'-"-^ties. 

' fiS? l^.'.'Jf'^"^" ^,i\-21) from the fungi Aspergillus wentii (A-3). Hansenula anomala, Kluyveromyces fra- 
nh^. ^^^'^ Schizophyllum commune and Trichoderma reesei (bIl 1 

' .urn thtnZ T'"^ Agrobacterium tumefaciens (Cbg1 ). Butyrivibrio fibrisoh^ens (bgIA). CIcStr d- 

Z om r Eschenchia coli (bglX). Envinia chrysanthemi (bgxA) and Ruminococcus albus 
■ Alteromonas strain 0-7 beta-hexosaminidase A (EC 3.2. 1 .52). 

- Bacillus subtilis hypothetical protein yzbA. 

- Escherichica coli hypothetical protein ycfO and HI0959, the corresponding Haemophilus influenzae protein. 

beln'*lhn^ n"!' ^^9'°"^ ^"^-^6= is centered on a conserved aspartic acid residue which has 

?arufedTs f si^n^ir?" ^'"'^^^"^^^ *° """"^'^ ^^^'^ --^-'^ ^^'^ -9ion^ 

[0755] Consensus pattem[LIVM](2)-[KR]-x-[EQK]-x(4)-G-[LIVMFT]-[LIVT].[LIVMF]- (ST)-D.x(2)-[SGADNI1 fD is the 
active site residue] Sequences known to belong to this class detected by the pattemALL 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2J Castle L.A., Smith K.D., IVIorris R.O. J. Bacteriol. 174:1478-1486(1992). 
[ 3] Bause E., Legler G. Biochim. Biophys. Acta 626:459-465(1 980). 

[0756] 253. (Glyco_hydro_28) 
Polygalacturonase active site (aka PG) 

Eh f'°'y9f'f 3.2.1.15) (PG) (pectinase) [1.2] catalyzes the random hydrolysis of 1 .4-alpha-O-ga- 
tec osKJuronic linkages in pectate and other galacturonans. In fruit, polygalacturonase plays an important role^l^ll 
waH metabolism during ripening. In plant bacterial pathogens such as Erwinia LLvora 7^1 j^onas 
X orp^t tiue."'' ' " --^9^'-— « - -0K,ed in maceration ZZt 

K.ea^:^d?gaS.^aTe'"^"^ 
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[0759] Prokaryotic. eukaryolic PG and exoPG share a few regions of sequence similarity. The best conserved of 
these regions was selected. It is centered on a conserved histidine most probably involved in the catalytic mechanism 

[0760] Consensus pattem[GSDENKRHl-x(2HVMFC].x(2)-[GSl-H.G.[LI VMAG].x(1 .2)-[LI VMJ-G-S (H is the putative 
active site residue] Sequences known to belong to this class detected by the pattemALL. 
[0761] Note: these proteins belong to family 28 in the classification of gtycosyl hydrolases [SJ. 

[ 1] Ruttowski E.. Labitzke R.. Khanh N.Q.. Loeffler F., Gottschalk M., Jany K.-D. Biochim. Biophys Acta 1087* 
104-106(1990). K y - >uo^. 

[ 2] Huang J., Schell M.A. J. Bacterlol. 172:3879-3887(1990). 

[ 3] He S.Y. Collmer A. J. Bacteriol. 172:4988-4995(1990). 

[ 4] Bussink HJ.D.. Buxton FR, Visser J. Curr. Genet. 19:467-474(1991). 

[ 5] Henrissat B. Bk)chem. J. 280:309-316(1991). 

[0762] 254. (Glyco_hydro_32) 
Gfycosyl hydrolases family 32 active site 

[0763] It has been shown [1 .2] that the following glycosyl hydrolases can be classified into a single family on the 
basis of sequence similarities: 

- Inulinase (EC 32.1 .7) (or inulase) from the fungi Kluyveromyces marxianus. 

- Beta-fructof uranosidase (EC 3.2, 1 .26). commonly known as invertase in fungi and plants and as sucrase in bacteria 
(gene sacA or scrB). 

- Raffinose invertase (EC 32.1 .26) (gene rafD) from Escherichia coli plasmid pRSD2. 

- Levanase (EC 32.1.65) (gene sacC) from Bacillus subtilis. 

[0764] One of the consented regions in these enzymes is located in the N^erminal section and contains an aspartic 
acid residue which has been shown [3], in yeast invertase to be important for the catalytic mechanism This region was 
used as a signature pattern. 

[0765] Consensus pattern H-x(2)-p.x(4).[LI VMJ-N-D-P-N-G (D is the active site reskJue) Sequences known to belong 
to this class detected by the patternALL. 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] Gunasekaran R, Karunakaran T. Cami B., Mukundan A.G.. Preziosi L., Baratti J. J. Bacteriol. 172 6727-6735 
(1 990). 

[ 3] Reddy V.A.. Maley R J. Biol. Chem. 265:10817-10120(1990). 

[0766] 255. (Glyco_hydro_1) 
Glycosyl hydrolases family 1 signatures 

[0767] It has been shown [1 to 4] that the following glycosyl hydrolases can be. on the basis of sequence similarities 
classified into a single family: 

- Beta-glucosidases (EC 3.2. 1.21) from various bacteria such as Agrobacterium strain ATCC 21 400. Bacillus poly- 
myxa, and Caldocellum saccharolyticum. 

- Two plants (clover) beta-glucosidases (EC 3.2. 1 .21 ). 

- Two different beta-galactosidases (EC 3.2. 1 .23) from the archaebacteria Sulfobbus solfataricus (genes bgaS and 
lacS). 

- 6-phospho-beta-galactosidases (EC 3.2.1 .85) from various bacteria such as Lactobacillus casei, Lactococcus lac- 
tis, and Staphylococcus aureus. 

- 6-phospho-beta-9lucosidases (EC 3.2. 1 .86) from Escherichia coll (genes bgIB and ascB) and from Enwinia chrv- 
santhemi (gene arbB). 

- Plants myrosinases (EC 3.2. 3. 1 ) (slnigrinase) (thioglucosidase). 

- Mammalianlactase-phlorizinhydrolase(LPH)(EC3.2.1.108/EC3.2.1.62). LPH. an integral membrane glycopro- 
tein, IS the enzyme that splits lactose in the small intestine. LPH is a large protein of about 1900 residues which 
contains four tandem repeats of a domain of about 450 residues which is evolutionary related to the above qlvcosvl 
hydrolases. ' 

[0768] One of the conserved regions in these enzymes is centered on a conserved glutamic acid residue which has 
been shown [5], in the beta-glucosidase from Agrobacterium. to be directly involved in glycosidic bond cleavage by 
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acting as a nucleophile. This region was used as a signature pattern. As a second signature pattern we selected a 
r«^f fT^"^ ^""""^ N-terminal extremity of these enzymes, this region also contains a glutamic acid residue 
[0769] Consensus pattem[LIVMFSTC].[LlVFYS]-[UV]-[LIVMST].E-N-G-[LIVMFAR]-[CSAGNI [E is the active site 
residue] Sequences known to betong to this class detected by the patternALL. 

[0770] Note: this pattem will pick up the last two donnains of LPH; the first two domains, which are removed from the 
LPH precursor by proteolytic processing, have lost the active site glutamate and may therefore be inactive [41 
[0771] Consensus pattemF-x-[FYWMl-(GSTA]-x.[GSTA)-x.[GSTA](2)-[FYNH]-[NQ].x-E-x-[GSTA] Sequences 
known to belong to this class detected by the pattern ALL. 
[0772] Note: this pattern will pick up the last three domains of LPH. 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] Henrissat B. Protein Seq. Data Anal. 4:61-62(1991). 

[ 3] Gonzalez-Candelas L, Flamon D., Polaina J. Gene 95:31-38(1990). 

[ 4] El Hassouni M., Henrissat B., Chippaux M.. Barras F J. Bacteriol. 174:765-777(1992) 

[5] Withers S.G.. Warren R.A.J., Street LP, Rupitz K.. Kempton J.B. Aebersold R J, Am. Chem Soc 112' 

5887-5889(1990). 

[0773] 256. Glyco_hydro_20 

Glycosyl hydrolase family 20 

Previous Pfam IDs: glycosyl_hydr1 1 ; 

Number of members: 33 

[0774] 257. (Gfyco_hydro_9) 

Glycosyl hydrolases family 9 active sites signatures 

(aka Glycosyl_hydr12) 

^® microbial degradation of cellulose and xylans requires several types of enzymes such as endoglucanases 
(EC 3.2.1.4). cellobiohydrolases (EC 3.2.1.91) (exoglucanases). or xylanases (EC 32.1.8) [1,2] Fungi and bacteria 
produces a spectrum of cellulolytic enzymes (cellulases) and xylanases which, on the basis of sequence similarities 
can be classified into families. One of these families is known as the cellulase family E [3] or as the glycosyl hydrolases 
family 9 [4,E1]. The enzymes which are currently known to belong to this family are listed below. 

- Butyrivibrio fibrisolvens cellodextrinase 1 (cedl ). 

- Cellulomonas fimi endoglucanases B (cenB) and C (cenC). 
Clostridium cellulolyticum endoglucanase G (celCCG). 
Clostridium cellulovorans endoglucanase C (engC). 

- Clostridium stercoararium endoglucanase 2 (avicelase I) (celZ). 

- Clostridium thermocellum endoglucanases D (celD). F (celF) and I (cell). 
Fibrobacter succinogenes endoglucanase A (end A). 
Pseudomonas fluorescens endoglucanase A (celA). 

Streptomyces reticuli endoglucanase 1 (cell). 
Thermomonospora fusca endoglucanase E-4 (celD). 

- Dictyostelium discoideum spore germination specific endoglucanase 270-6, This slime mold enzyme rmy digest 
the spore cell wall during germination, to release the enclosed amoeba. 

- Endoglucanases from plants such as Avocado or French bean. In plants this enzyme may be involved the fruit 
ripening process. 

[0776] Two of the most conserved regions in these enzymes are centered on consen/ed residues which have been 
shown [5,6]. in the endoglucanase D from Cellulomonas themrtocellum, to be important for the catalytic activity. The 
first region contains an active site histidine and the second region contains two catalytically important residues an 
aspartate and a glutamate. Both regions were used as signature patterns 

[0777] Consensus pattem [STV].x-[LI VMFY]-[STV]-x(2)-G-x-[NKR]-x(4)-[PU VMJ-H-x-R (H is an active site residue] 
Sequences known to bebng to this class detected by the pattem ALL, except for Cellulomonas fimi cenC and Strep- 
tomyces reticuli cell . ^ 

[0778] Consensus pattem [FYW].x-D-x(4).[FYW].x(3)-E-x.[STA]-x(3)-N.[STA] [D and E are active site residues] Se- 
quences known to belong to this class detected by the pattem ALL, except for Fibrobacter succinogenes endA whose 
sequence seems to be incorrect. 

( 1] Beguln P. Annu. Rev. Microbiol. 44:219-248(1990). 
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[ 2] Gilkes N.R, Henrissat B., Kilbum D.G.. Miller R.C. Jr., Warren R.A.J. Microbiol. Rev. 55:303-315(1991). 
[ 3] Henrissat B., Claeyssens M., Tomme R, Lemesle L. Momon J.-R Gene 81:83-95(1989) 
[ 4] Henrissat B. Biochem. J. 280:309-316(1 991), 

[5] Tomme R. Chauvaux S., Beguin R. Millet J., Aubert J.-R. Claeyssens M. J. Biol. Chem 266*10313-10318 
(1991). 

[ 6] Tcxnme R, van Beeumen J.. Claeyssens M. Biochem. J. 285:319-324(1992). 
[0779] 258. Matrix protein (MA), p15 (GAG_ma) 

[0780] The matrix protein, pi 5, is encoded by the gag gene. MA is involved in pathogenicity [1], 
[0781] [1 ] : Pozsgay JM. Beilharz MW. Wines 8D. Hess AD, Pitha PM. J Virol 1 993;67:5989-5999. 
[0782] 259. Gag polyprolein, inner coat protein pi 2 (G AG_P1 2) 

[0783] The retroviral pi 2 is a virion structural protein, pi 2 is proline rich. The function carried out by pi 2 in assembly 

and replication is unknown. p12C is associated with pathogenicity of the virus 

[1] Pozsgay JM, Beilharz MW, Wines BD. Hess AD, Pitha PM. J Virol 1993:67:5989-5999. 

[0784] 260. Glutamine synthetase signatures (GLN-SYNT) 

Glutamine synthetase (EC 6.3.1.2) (GS) [1] plays an essential role in the metabolism of nitrogen by catalyzing the 
condensation of glutamate and ammonia to form glutamine. There seem to be three drtferent classes of GS [2,3,4]: - 
Class I enzymes (GSI) are specific to prokaryotes, and are oligomers of 12 identical subunits. The activity of GSI4ype 
enzyme is controlled by the adenylation of a tyrosine residue. The adenylated enzyme is inactive. - Class II enzymes 
(GSII) are found in eukaryotes and in bacteria belonging to the Rhizobiaceae. Frankiaceae. and Streptomycetaceae 
families (these bacteria have also a class-l GS). GSII are octamer of identical subunits. Plants have two or more 
isozymes of GSII, one of the isozymes is translocated into the chloroplast. - Class III enzymes (GSIII) has, currently, 
only been found in Bacteroides fragilis and in butyrivibrio fibrisolvens. It is a hexamer of identical chains 'it is much 
larger (about 700 amino acids) than the GSI (450 to 470 amino acids) or GSII (350 to 420 amino acids) enzymes 
While the three classes of GS's are clearly structurally related, the sequence similarities are not so extensive As 
signature patterns three conserved regions were selected. The first pattern is based on a conserved tetrapeptide in 
the N-terminal section of the enzyme, the second one is based on a glycine-rich regbn which is thought to be involved 
in ATP-binding. The third pattern is specific to class I glutamine synthetases and includes the tyrosine residue which 
is reversibly adenylated. 

[0785] Consensus pattern: [FYWL]-D-G-S-S-x(6,8).[DENQSTAKHSA]-[DE]-x(2)-[LIVMFY]- 

Consensus pattern: K-P-[LIVMFYA]-x(3.5)-[NPAT]-G-[GSTAN]-G-x-H-x(3)-S- 

Consensus pattern: K-[L1 VM].x(5).[LI VMA]-D.[RK]-[DN]-[L1]-Y [Y is the site of adenylation]- 

[ 1] Eisenberg D.. Almassy R.J.. Janson C.A., Chapman M.S., Suh S.W., Cascio D.. Smith WW Cold Spring 
HariDor Symp. Quant. Biol. 52:483-490(1987). 

[2] Kumada Y, Benson D.R., Hillemann D.. Hosted TJ., Rochefort D.A.. Thompson C.J.. Wohlleben W Tateno 

Y Proc- Natl. Acad. Sci. U.S.A. 90:3009-3013(1993). 

[ 3] Shatters R.G., Kahn M.L. J. Mol. Evol. 29:422-428(1989). 

[ 4] Brown J.R., Masuchi Y. Robb F.T, Doolittle W.R J. Mol. Evol. 38:566-576(1994). 

[0786] 261 . Globins profile (globini ) 

Globins are heme^ontaining proteins involved in binding and/or transporting oxygen [1]. They belong to a very large 
and well studied family which is widely distributed in many organisms. The major groups of globins are: - Hemoglobins 
(Hb) from vertebrates. Hb is the protein responsible for transporting oxygen from the lungs to other tissues It is a 
tetramer of two alpha and two beta chains. Most vertebrate species also express specific embryonic or fetal forms of 
hemoglobin where the alpha or the beta chains are replaced by a chain with higher oxygen affinity, as for the gamma 
delta, epsilon and zeta chains in mammals, for example, - Myoglobins (Mg) from vertebrates. Mg is a monomeric 
protein responsible for oxygen storage in muscles. - Invertebrate globins (2). A wide variety of globins are found in 
invertebrates. Molluscs generally have one or two muscle globins which are either monomeric or dimeric. Insects such 
as the midge Chironomus thummi, have a large set of extracellular globins. Nematodes and annelids have a variety 
of intracellular and extracellular globins; some of them are multi- domain polypeptides (from two up to nine^lomain 
globins) and some produce large, disulfide-bonded aggregates. - Leghemoglobins (Lg) from the root nodules of legu- 
minous plants. Lg provkJes oxygen for bacteroids. - Flavohemoproteins from bacteria (Escherichia coli hmpA) and 
fungi [3]. These proteins consist of two distinct domains: an N-terminal globin domain and a C-terminal FAD-containing 
reductase domain. In bacteria such as Vitreoscilla, the enzyme-associated globin is a single domain protein All these 
globins seem to have evolved from a common ancestor. The profile developed to detect members of the globin family 
IS based on a structural alignment of selected globin sequences 

[ 1] Concise Encyclopedia Biochemistry, Second Edition, Walter de Gruyter, Berlin New-York (1988).! 2] Goodman M 
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^"'"^ ^''''''"^ """'^ °- ^"°9-dov S. J. Mol. Evol. 27: 

[0787] Plant hemoglobins signature (globin2) 

Leghemoglobins [1 ] are hemoprotelns present In the root nodulesof legumlnousplants. Leghemoqiobinsare structurallv 
svm '° ^ «° bacteroKeTare eSentan^^ 

Tf! ^^C!?^' '^"^ ^"^^"'^ A P««e^ was developed that^Z 

the sequence of plants henragtoblns. exclusively. "i^^bo "wi picns up 

[0788] Consensus pattern: [SN)-P-x-L-x(2)-H-A-x(3)-F- 

[ 1] Powell R., Gannon F. BioEssays 9:117-121(1988). 

[ 2] Kortt A.A.. Trinick M.J., Appleby C.A. Eur. J. Biochem. 175:141-149(1988) 

[ 3J Kortt A.A., Inglis A.S.. Fleming A.I.. Appleby C.A. FEBS Lett. 231:341-346(1988) 

[ 4] Bogusz D.. Appleby C.A.. Landsmann J., Dennis E.S.. Trinick M.J.. Peacock W.J. Nature 331:178-180(1988). 
[0789] 262. Fructose-bisphosphate aldolase class-l active site (glycolytic_enz) 

[0790] Fructose-bisphosphate aldolase (1 .2] Is a glycolytic enzyme that catalyzes the reversible aldol cleavaae or 
c^densation of fructose-1 .6-bisphosphate into dihydroxyacetone-phosphate and glyceraldehydXl^SriLe 
are two classes of fructose-bisphosphate aldolases with different eatable mechanisms Cla J-l aldSSs 13 main^ 
found .n higher eukaryotes, are homotetrameric enzymes which fomTa Schiff-base inteSate between Ihe^^^ 
veSf T r ('^if^y'lroxyacetone phosphate)and the epsllon-amino groupT^ ^s nT^due n 

vertebrates, three fomis of th.s enzyme are found: aldolase A in muscle, aldolase B In Lr and aldise C in brain 

t!;s^3o~' '^""^ ''''-"-'^ ^'^^'^ - usedtTslgnlreTr 

[0791] Consensus pattem: (LIVM]-x-[LIVMFYW]-E-G-x-[LS]-L-K-P-(SN] [K is involved in Schiff-base formation]- 

1 1] Perham R.N. Biochem. Soc. Trans. 18:185-187(1990). 

[2] Marsh J.J., Lebherz H.G. Trends Biochem. Sci. 17:110-113(1992) 

( 3] Freemont P.S., Dunbar B., Fotherglll-Gilmore L.A. Biochem. J. 249:779-788(1988). 

[0792] 263. Glycosyl hydrolases family 1 1 active sites signatures 

The microbial degradation of cellulose and xylans requires several types of enzymes such as endoglucanases (EC 
M14). cellobiohydrolases (EC 3.2.1.91) (exoglucanases). or xylanases (EC 3.2 1 8) [1 2] FuSd bacferl iff 

family 1 1 [4,£l ]. The enzymes which are currently known to belong to this family are listed below - ^pera«lus awamori 
xjnanase C (xynC). - Bacillus circulans, pumilus. stearothermophllus and subL xylanaseX-Ar. SS^um^^^^^^ 
1? T.""^"^'" (xynB). - Clostridium stercorarium xylanase A (xynA). - FIbrabacter sSogeneTZnase C 
xynC Which consist of two catalytic domains that both belong to family 10 - Neocallimastbc Srum Snase A 
(xynA). - Ruminococcus flavefaclens bif uncttonal xybnase XYLA (xynA) This protein cons'stsSrr^Tafns a N 

ZTep^atrorGrAL^^^^^^^ "ir^ '° '^^'^ °' ^^'^^^ ^^^^'---^ ^ c^frrrn^^pSe, Of" 

hlnllT ?K ^? ^"^ ^ '^'^"^'^ "^'^^^^ tfiat belongs to family 10 ofXosyl 

hydrolases. - Schizophyllum commune xylanase A. - Streptomyces lividans xylanases B (xInB) and C (x^^C) ^rV 

esleTwlScTha'rr^rb' "'h" ^'^'^^ ''^'^ ^-^-^ ^ clnSd on g. I'c 

reTorwru'se^sS^^^^^^^^ ^^'""^ ^""^"'^ ^'^^^ '° ^« — ^ -~i.y. Both 

[0793] Consensus pattem: [PSA]-[LQ]-x-E-Y-Y-[LIVM](2)-[DE]-x-[FYWHN] [E is an active site residuel- 
Consensus pattern: ILIVMF]-x(2)-E-[AGHYWG].[QRFGSHSGWSTAN]-G-x-[SAF] [E Is an a^JsTe residue]- 

[ 1] Beguin P. Annu. Rev. Microbiol. 44:219-248(1990) 

[ 2] Gilkes N R Henrissat B.. Kilbum D.G. Miller R.c' Jr.. Warren R.A.J. Microbiol. Rev. 55:303-315(1991) 
3 Hennssat B., Claeyssens M.. Tomme R. Lemesle L. Momon J.-R Gene 81 •83-95(1989) 
[ 4] Henrissat B. Biochem. J. 280:309-31 6(1 991 ) 

517-12109927''""" ' '''''^'^^ ^''"^''^ ^' ^^^'"'^ O^^^ ^- 288: 

[0794] 264. Glycosyl hydrolase family 1 4 
[0795] This family are beta amylases. 
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[0796] 265. Glycosy) hydrolases family 1 signatures 

It has been shown [1 to 4] that the following glycosyl hydrolases can be. on the basis of sequence similarities classified 
intoa single family: - Beta-glucosidases (EC 3.2. 1.21 ) from varfous bacteria such as Agrobacterium strain ATCC 21400 
Bacillus polymyxa. and Caldocellum sacchatolyticum. - Two plants (clover) beta-glucosidases (EC 32121) - Two 
different beta-galactosidases (EC 3.2. 1.23) from the archaebacteria Sulfolobus solfataricus (genes bgaS and lacS) - 
6-phospho-beta-galactosidases (EC 3.2.1.85) from various bacteria such as Lactobacillus easel, Lactococcus lactis 
and Staphylococcus aureus. - 6-phospho^)eta-glucosidases (EC 3.2.1. 86) from Escherichia coli (genes bgIB and ascB) 
and from Envinia chrysanthemi (gene arbB). - Plants myrosinases (EC 3.2.31 ) (sinigrinase) (thioglucosidase) - Mam- 
rnaiian lactase-phlorizin hydrolase (LPH) (EC 3.2.1.108 / EC 3.2.1 .62 ). LPH, an integral membrane glycoprotein is 
the enzyme that splits lactose in the small intestine. LPH is a large protein of about 1 900 residues which contains four 
tandem repeats of a domain of about 450 residues which is evolutionary related to the above glycosyl hydrolases One 
of the conserved regions in these enzymes is centered on a consented glutamic acid residue which has been shown 
[5], in the beta-glucosidase from Agrobacterium. to be directly involved in glycosidic bond cleavage by acting as a 
nucleophile. This region was used as a signature pattern. As a second signature pattern a conserved region was 
selected, found in the N-terminal extremity of these enzymes, this region also contains a glutamic acid residue 

[0797] Consensus pattern: [LIVMFSTC]-[LIVFYS]-[LIV]-[LIVMST]-E-N-G-[LIVMFAR]-[CSAGN] [E is the active site 

residue] 

Note: this pattem will pick up the last two domains of LPH; the first two domains, which are removed from the LPH 
precursor by proteolytic processing, have lost the active site glutamate and may therefore be inactive [41 
[0798] Consensus pattem: F-x-[FYWM]-[GSTA]-x-[GSTA]-x-[GSTA](2)-[FYNH]-[NQ]-x-E-x-(GSTAI- 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] Henrissat B. Protein Seq. Data Anal. 4:61-62(1991). 

[ 3] Gonzalez-Candelas L., Ramon D.. Polaina J. Gene 95:31-38(1990). 

[ 4] El Hassouni M., Henrissat B., Chippaux M.. Barras F J. Bacteriol. 174-765-777(1992) 

[ 5] Withers S.G., Warren R.A.J.. Street I.P. Rupitz K., Kempton J.B.. Aebersold R. J. Am. Chem. Soc. 112: 

5887-5889(1 990). 

[0799] 266. Glycosyl hydrolases family 2 signatures 

It has been shown [1 ,2,E1 ] that the following glycosyl hydrolases can be, on the basis of sequence similarities, classified 
into a single family: - Beta-galactosidases (EC 32.1.23) from bacteria such as Escherichia coli (genes lacZ and ebgA) 
Clostridium acetobutylicum, Clostridium thermosulfurogenes, Klebsiella pneumoniae. Lactobacillus delbrueckii or 
Streptococcus thermophilus and from the fungi Kluyveromyces lactis. - Beta-glucuronidase (EC 32 1 31 ) from Es- 
cherichia coli (gene uidA) and from mammals. One of the conserved regions in these enzymes is ^irti^id on a con- 
served glutamic acid residue which has been shown [3], in Escherichia coli lacZ, to be the general acid/base catalyst 
in the active site of the enzyme. TTiis region was used as a signature pattem. As a second signature pattem a highly 
consented region was selected located some sixty residues upstream from the active site glutamate 
n^(4)-°°"^^"^"^ ^"''■f-'^'^^^°l-'^-IST'^CN](2)-H-Y-P-x(4)-[LIVMFYWS](2)-x(3)- [DN]-x(2)-G-[LIVM- 

Consensus pattem: [DENQLF]-[KRVW]-N-[HRY]-[STAPV]-[SAC]-[LIVMFS](3)-W-[GS]- x(2,3)-N-E fE is the acth^e site 
resiauej- 

[ 1] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 2] Schroeder C.J.. Robert C. Lenzen G., McKay LL. Mercenier A. J. Gen. Microbiol. 137-369-380(1991) 
[ 3] Gebler J.C, Aebersold R. Withers S.G. J. Biol. Chem. 267:11126-11130(1992). 

[0801] 267. Glycosyl hydrolases family 3 active site 

It has been shown [1 .2] that the following glycosyl hydrolases can be. on the basis of sequence similarities classified 
into a single family: 

- Beta glucosidases (EC 3.2.1 .21) from the fungi Aspergillus wentll (A-3), Hansenula anomala. Kluyveromyces fra- 
gilts. Saccharomycopsis fibuligera, (BGL1 and BGL2). Schizophyllum commune and Trichoderma reesei (BGL1 ) 

- Beta glucosidases from the bacteria Agrobacterium tumefaciens (Cbgl ). Butyrivibrio fibrisolvens (bgIA) Clostrid- 
ium thermocellum (bgIB). Escherichia coli (bglX). Erwinia chrysanthemi (bgxA) and Ruminococcus albus - Al- 
teronrKsnas strain 0-7 beta-hexosaminidase A (EC 3.2.1,52). 

- Bacillus subtilis hypothetical protein yzbA. 

- Escherichica coli hypothetical protein ycfO and HI0959. the corresponding Haemophilus influenzae protein. 
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One of the conserved regions in these enzymes is centered on a conserved aspartic acid residue which has been 

ic^\ite''°Sr ^''"^^^ 

[ 1} Henrissat B. Bkxrhem. J. 280:309-316(1 991 ). 

[ 2] Castle L.A.. Smith K.D., Morris R.O. J. Bacteriol. 174:1478-1486(1992). 
[ 3] Bause E.. Legler G. Biochim. Biophys. Acta 626:459-465(1 980). 

[0803] 268. Glycosyl hydrolases family 8 signature 

MAA). cellobiohydrolases (EC 32^91 )(exogiucanases). orxylanases (EC 3.2.1.8) [1.2]. Fungi^d bacteria oro- 
duces a spectrum Of cellulolytic enzymes (cellulases) and xylanases which, onli^sil of Uuence IS c^ 
be f -nto famiKes. One of these families is known as the cellulase family D [3] or asThe g^cosThSes 
Zl Jt^. ^ ''"''"'^ '° «o t^^i- 'ami^ are liJted betow - AcL^Ler Xm 

m "''a endoglucanase. - Clostridium cellulolyticum endoglucanases C (celcCC). - Clostridium 

Leta^cTnL'fSaTlTr VT^^ h ''r"*'^"' endoglucanase y (celY). - Bacillus circutens 
beta glucanase (EC 32173). - Escherichia coli hypothetical protein yhjM. The most consented region in these en- 

aTas^h?nu^ M T''' '^""^'"^ ^""^^"'^^ aspa Jte s houghtTsno 

act as the nucleophile in the cata^ic mechanism. This region was used as a signature pattern 

Su'eT"' A-[ST]-D-[AG]-D-x(2)-[IM]-A-x-[SA]-[LIVMHLIVMG]-x.A- x(3)-[FWl [The first D is an active site 

1 1] Beguin P. Annu. Rev. Microbiol. 44:219-248(1990) 

[ 2] Gilkes N.R Henrissat B.. Kilbum D.G.. Miller R.C. Jr.. Warren R.A.J. Microbiol. Rev 55:303-315(1991) 

[ 3] Henrissat B., Claeyssens M.. Tomme P.. Lemesle L. Mornon J.-R Gene 81 83-95(1 989) 

( 4] Henrissat B. Biochem. J. 280:309-31 6(1 991 ). 

[ 5] Alzari P.M.. Souchon H., Dominguez R. Structure 4:265-275(1996). 

[0804] 269. Glycosyl hydrolases family 9 active sites signatures 

I p'rJr^if 1''!?'!'^''°" °L'1"'°'^ ^"^^ ^^''"'■^^^ 'yP«s °' en^^^es such as endoglucanases (EC 

32J^.celtobiohydrolases(ECMlJ1)(exoglucanases).orxylanases(EC32.1.8)[1.2]Fungiandta^^^^ 

a spectn^m Of cellulolytic enzymes (cellulases) and xylanases which, on thT^iJs of sequenceS2rs cS be 

clas.f,ed into families. One of these families is known as the cellutese family E [3] or as the glycosyl hydrolases fTmHy 

ciSrlasriTpHir' r "''"^"'K '° '"^"^ *° '^"^ '""'^ ' ^^^V^brio fibrisolvens 
ceHodextrinase 1 cedl). - Cellulomonas fimi endoglucanases B (cenB) and C (cenC). - Clostridium cellutolvticum 
endoglucanase G (celCCG). - Clostridium cellulovorans endoglucanase C (engC) - Clostridium ster^rarrn enSr 
lucanase Z (avcelase I) (celZ). - Clostridium thermocellum endoglucanases D (celo). F (celF) aJ^^l ZStZ 
succ-nogenes endoglucanase A (endA). - Pseudomonas fluorescens endoglucanase A (celA) - StS LycesVeTcul 
minS^':„""T ' ' -P°-'"-a endoglucanase E-4 (celD). - Dictyosteliuri, discoiSZspo e ge" 

^ le? rthTf r ^"f9l"^^««« 270-6. This slime moW enzyme may digest the spore cell wall during gemiination to 
re^ase the enclosed amoeba. - Endoglucanases from pbnts such as Avocado or French bean. In plarns this enzJme 

ir^Z . , T"", "T"^ ''"'^ f^-'^- endoglucanase D from Cellulomonas thermocellum to Z 

important or the catalytic act«,ity. The first region contains an acth,e site histidine and the second regfon cont^ns two 
^IW-cally important residues: an aspartate and a glutamate. Both regions were used as signature pTtt^s 
[0805] Consensus pattern: [STV]-x-(LIVMFY]-[STV]-x(2)-G-x-[NKRJ-x(4)-[PLIVM]-H-x-R(H isanactLsite residuel- 
Consensus pattern: [FYW]-x-D-x(4)-[FYW]-x(3)-E-x-[STA]-x(3)-N-[STA] [D Lnd E are actKre site resS^es] ' 

[ 1] Beguin R Annu. Rev. Microbiol. 44:219-248(1990). 

[ 2] Gilkes N.R Henrissat B. Kilbum D.G.. Miller R.C* Jr. Warren R.A.J. Microbiol. Rev. 55-303-315(1991) 
[ 3] Hennssat B.. Claeyssens M., Tomnrte P.. Lemesle L. Momon J.-R Gene 81*83-95(1989) 
[ 4] Henrissat 8. Biochem. J. 280:309-316(1991). 

[5] Tomme R. Chauvaux S.. Beguin R, Millet J.. Aubert J.-R, Claeyssens M. J. Biol. Chem. 266:10313-10318 
[ 6] Tomme R, van Beeumen J., Claeyssens M. Biochem. J. 285:319-324(1992). 
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[0806] 270. Glyceialdehyde 3-phosphate dehydrogenase active site (gpdh) 

Glyceraldehyde 3-phosphate dehydrogenase (EC 1.2.1.12) (GAPDH) [1] is a tetramerlc NAD-bindlng enzyme common 
to both the glycolytic and gluconeogenic pathways. A cysteine in the middle of the molecule is involved in forming a 
covalent phosphoglycerol thioester intennediate. The sequence around this cysteine is totalV consen/ed in eubacterial 
and eukaryotic GAPDHs and is also present, albeit in a variant form, in the othenwise highly divergent archaebacterial 
GAPDH [2]. Escherichia coli D-erythrose 4-phosphate dehydrogenase (E4PDH) (gene epd orgapB) is an enzyme hiahlv 
related to GAPDH [3]. ^ » j 

[0807] Consensus pattern: [ASV]-S-C-[NT]-T-x(2)-[LIM] [C is the active site residue]- 

[ 1] Harris J. I.. Waters M. (In) The Enzymes (3rd edition) 13:1-50(1976). 

[ 2] Fabry S.. Lang J.. Niermann T. Vingron M.. Hensel R. Eur. J. Biochem. 179:405-413(1989). 

[3] Zhao G., Pease AJ., Bharani N., Winkler M E. J. Bacterbl. 177:2804-2812(1995). 

[0808] 271. Granulins signature 

Granulins [1 ] are a family of cysteine-rich peptides of about 8 Kd which may have multiple biobgical activity. A precursor 
protein (known as acrogranin) potentially encodes seven different forms of granulin (gmA to grmG) which are probably 
released by post-translational proteolytic processing. A schematic representation of the structure of a granulin is shown 
below: 

xxxCxxxxxCxxxxxCCxxxxxxxxCCxxxxxxCCxxxxxCCxxxxxCxxxxxxCx "**««^.-c': conserved cysteine probably 
involved in a disulfide bond.'*': position of the pattern. Granulins are evolutionary related to a PMP-Dl a peptide 
extracted from thepars intercerebralis of migratory locusts [2]. 

[0809] Consensus pattern: C-x-D-x(2)-H-C-C-P-x(4)-C JJhe four C's are probably involved in disulfide bondsj- 

[ 1] Bhandari V., Palfree R.G., Bateman A. Proc. Natl. Acad. Sci. U.S.A. 89:1715-1719(1992). 
[ 2] Nakakura N., Hietter H., van Dorsselaer A., Luu B. Eur J. Biochem. 204:147-153(1992). 

[0810] 272. (HCV RdRp) Hepatitis C virus RNA dependent RNA polymerase 

[0811] The RNA dependent RNA polymerase is also known as non-structural protein NS5B. NS5B is a65 kDa protein 
that resembles other viral RNA polymerases. HCV replication is thought to occur in membrane bound replication com- 
plexes. These complexes transcribe the positive strand and the resulting minus strand is used as a template for the 
synthesis of genomic RNA. There are two viral proteins involved in the reaction. NS3 and NS5B [1 2] 
[0812] [1]LohmannV. KornerF, HerianU, BartenschlagerR; 
J Virol 1997;71:8416-8428. [2] Behrens SE, Tomei L. De Francesco R- 
EMBO J 1 996; 1 5:1 2-22. [3] Ishido S, Fujita T. Hotta H; 
Biochem Biophys Res Commun 1998;244:35-40. 
[0813] 273 (HHH) Helix-hairpin-helix motif. 

[0814] [1] Doherty AJ. Serpell LC, Ponting CP; Nucleic Acids Res 1 996;24:2488-2497 
[0815] 274. HIT family signature 

Recently a family of small proteins of about 12 to 16 Kd has been described[1]. This family currently consists of - 
Mammalian protein HINT (also known as Protein kinase C inhibitor 1 or PKCI- 1). HINT was incorrectly thought to be 
a specific inhibitor of PKC. It has been shown to bind zinc. - Fission yeast diadenosine 5',5'"-P1.P4-tetraphosphate 
asymmetrical hydrolase (Ap4Aase) (EC 3.6.1.17 ) [2] (gene aphi), which cleaves A-5"-PPPP- 5'A to yield AMP and ATP 
- FHIT a human protein whose gene is altered in different tumors and which acts [3] as a diadenosine 5' 5"-Pl PS-tri- 
phosphate hydrolase (Ap3Aase) (EC 3.6.1.29) cleaving A-5'-PPP-5'A to yield AMP and ADR - Yeast proteins HNTI 
and HNT2. - Maize zinc-binding protein ZBP14 - Escherichia coli hypothetical protein yclR - Haemophilus influenzae 
hypothetical protein HI0961 . - Helicobacter pylori hypothetical protein HP0404. - Methanococcus jannaschil hypothet- 
ical protein MJ0866. - Mycobacterium leprae hypothetical protein U296A. - Synechocystis strain PCC 6803 hypothetical 
protein sirl 234. - Caenorhabditis elegans hypothetical protein F21 C3.3. - A hypothetical 1 3.2 Kd protein in hisE 3'region 
in Azospinllum brasilense. - A hypothetical 131 Kd protein in p37 5'region in Mycoplasma hyorhinis. - A hypothetical 
12.4 Kd protein in psbAII 5'regton in Synechococcus strain PCC 7942.AII these proteins contains a region with three 
clustered histidines. This region is responsible for the designation of this family: HIT, for 'HIstidineTrlad [1] This region 
was onginally thought to be implied in the binding of a zinc ran but was later klentified [4] as part of the alpha-phosphate 
binding site of a nucleotide-binding donrain. As a signature pattern, the region of the histidine triad was selected 
[0816] Consensus pattern: [NQAI-x(4)-(GAV]-x-(QF]-x-[UVM]-x-H-tLIVMFYTl-H-[LIVMFT|-H-[LIVMF](2)-[PSGA]- 

( 1] Seraphin B. DNA Seq. 3:177-179(1992). 

[ 2] Huang Y.. Garrison RN.. Barnes LD. Biochem. J. 312:925-932(1995). 

[ 3] Barnes L.D.. Garrison RN.. Siprashvili Z.. Guranowski A.. Robinson AK.. Ingram S.W.. Croce CM., Ohta M.. 
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Huebner K. Biochemistry 35: 1 1 529- 1 1 535(1 996) 

yiaSS: """"^^ °- ^ - « Lowenstein J.M. Nat. Struct. Biol. 4: 

[08171 275. IVIyc-type. ■helix-loop-helix' dimerization domain signature (HLH) 

herculK), in biros CMDI (QMF.l) t, x«,»uTS^^^c«'^ """^ <«"«'• »' 

pendent transcription in collaboration with E47 M^^lZZn , T ^'^ '^'^^ ^'""^^'^ ^ '^"-^s- 

wh.h may play 'an important r::rn h/^ eti;?Zntra o^^'^^^^^^ T T "^'^ ' 

activity Of the achaete- scltr^,^ ^ht^^^^^^^ 

Element Binding Protein 1 (SREBP^^ ,^ rnammahan Id proteins. - Human Sterol Regulatory 

1) found in theCnTrlS oUhe LD^rg'ntL^^^^^^^^ n '° ''tf "^"^'^'^ ^'^'"^"^ ' ^^"^ 

proteins T3 (I'sc), T4 (scute) ?5 (achaefe) and TO " ^'^^'^^'^^"'e (AS-C) complex 

the neuronal precursors inSe pe£era^^^^^^^^^ ""^f r'""' detem,inat»n of 

achaete-scute proteins the MASH maTh o ^ Vi^ ^y^'^'"' " Mammalian homologs of 

rogenesis. - SLug^Jer^essTi^^^^^ wS"i essT?",?' "'''^^ 

trmal reprsssOT ■ Drosoohlla MaTilZ,!!^ S , ! """'"Wnesis. also M as tonserip. 

.M.,e,„i^,„,.so.,aOAOHlaass^°lZS^.^lT„rraZaJS:S^^^^^^ 
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process. The schematic representation of the helix-loop-helix domain is shown here- 

[ 1] Murre C. IWcCaw P.S., Baltimore D. Cell 56:777-783(1 989V 
[ 2] Garrel J.. Campuzano S. BioEssays 13:493-498(1991). 
[ 3J Kato G. J., Dang C. V. FASEB J. 6:3065-3072(1 992). 

[ 4J Krause M., Fire A. Harrison S.W., Priess J., Weintraub H. Cell 63:907-919(1990) 
[ 5] Riechmann V, van Cruechten I., Sablitzky F. Nucleic Acids Res. 22:749-755(1994). 

[0819] 276. HMG14 and HMG1 7 signature 

th« n!^« I H ^"^ ^''^'""^ P^°t«ins "lay be involved in 

^Z7^lTu^,'^TVT""::'','^'' ^ ""'^"^ conformation. The trout nonhistone chromo- 

Zt«H ^kL m? ^ "'°?' T) also belongs to this family As a signature pattern a consen/ed stretch of 10 residues 
located in the N-terminal section of HMG14 and HMG17 was selected. 
[0820] Consensus pattern: R-R-S-A-R-L-S-A-[RK]-P- 

[0821] [ 1] Bustin M., Reeves R. Prog. Nucleic Acid Res. Mol. Biol. 54:35-100(1996). 
[0822] 277. Hydroxymethylglutaiyl-coenzyme A lyase active site (Hfi/IGLI ) 

iMGCoA^ZT.Trl!'^"T" 'y^'" °' ""-^ (EC 4ia4)cataly2es the transfomiation of 

HMG-CoA nto ace^l-CoAand acetoacetate. In vertebrates it is a mitochondrial enyme which is involved in ketoqenesis 

oLm"(lrrilm^^^^ 'r°r P-"c.omonas mevalonil « is involved in mevlnTc^ 

olism (gene mvaB). A cysteine has been shown(2]. in mvaB. to be required for the activity of the enzyme The region 
around this residue is perfectly consen/ed and is used as a signature pattern ^ 
[0823] Consensus pattern: S-V-A-G-L-G-G-C-P-Y [C is the active site residuej- 

K V?/?"r'i'' ' '^'iT!!!-'\''Z^ ""^"3 °- ^-E- Mende-f^ueller L.M.. Schappert 

K., Lee C. Gibson K.M.. Miziorko H.M. J. Bfol. Chem. 268:4376-4381(1993). 

( 2] Hruz RW.. Narasimhan C, Miziorko H.M. Bkjchemistry 31:6842-6847(1 992). 
[0824] Alpha-isopropylmalate and homocitrate synthases signatures (HMGL2) 

The following enzymes have been shown [1 ] to be functionally as well as evolutionary related: - Alpha-isopropylmalate 

andaSTJS^' T'!^ '''f''' '''' '""^ '^'^^y"^^^^'^ °' '^'^ condenLbn^ aceVSoA 

andalpha-keto^valeratetofomi2-isopropylmalatesynthase.-Homocitratesynthase(EC4J^ 

LclTS T T.TTTT °' '-"-°'y'^«""'" ^otactor of nitrogenase and Ltai;;^; t Zsitrn 
2ft 'J? - Soybean late nodulin 56. - Methanococcus jannaschii hyp^ 

thetical proteins MJ0503. MJ1 1 95 and IVU1 392. Two consented regions were selected as signature patterns fo ttie^ 

onnTf ■ ! l^^'^ ^^"^ ^^^'^ ^«9ion is located in the central section 

and contains two consen/ed histidine residues which could be implicated in the catalytic mechanism 

[082S] Consensus pattern: L-R-[DE]-G-x-Q-x(10)-K- 

Consensus pattern: [LIVMFW]-x(2)-H-x-H-[DN]-D-x-G-x-(GAS]-x-[GASLI]- 

[0826] [ 1] Wang S.-Z., Dean D R.. Chen J.-S., Johnson J.L J. Bacteriol. 173 3041-3046(1991) 

™ A ^^f.; ''^^a^^ T^^ HydroxymethylglutaryKoenzyme A synthase active site HydroxymethylglutaryKoen- 

zyme A synthase (EC £ia5) (HMG-CoA synthase) catalyzes the condensation of acetyl^CoA Lh acetoacStySoA 

a cytosolic form which is the starting point of the mevalonate pathway which leads to cholesterol and other ^erolic and 
S^r. if ^°^P°""ds and a mitochondrial form responsible for ketone body biosynthesis. HMG-CoA is also found in 
In nHH r '"T- ^ '""3' '''''''' "^"^^ »° ^^^^ ""^leophile in the first 

sr^irrrsi^nr^^^^^^^ 

K r'^iTpT"' ^T"^- t''-(DN]-[IV)-E-G-{IV]-D-x(2)-N-A-C-[FY]-x-G [C is the active site residue]- 
L'!Shl.^BLXr312-lC^^^^^ '"'^"'^ ' ' ' - ^'^^-^ — J.D. Arch. 

[0S30] 279. HMG (high mobility group) box 
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[0831] 280. HSF-type DNA-binding dcwnain signature 

[0832] Heat shock factor (HSF) is a DNA-binding protein that specifically binds heat shock promoter elements (HSE). 
HSE is a palindromic element rich with repetitive purine and pyrimidine motifs; 5'^GAAnnTTCnnGAAnnTTCn-3'. HSF 
is expressed at normal temperatures but is activated by heat shock or chemical stressors [1 .2]. The sequences of HSF 
from various species show extensive similarity in a region of about 90 amino acids, which has been shown [3] to bind 
DNA. Some other proteins also contain a HSF domain, these are: - Yeast SFL1, a protein involved in cell surface 
assembly and regulation of the gene related to flocculation (asexual cell aggregation) [4]. - Yeast transcriptiOT factor 
SKN7 (or BRY1 or POS9), which binds to the promoter elements SCB and MCB essential for the control of G1 cyclins 
expression [5]. - Yeast MGA1. - Yeast hypothetical protein YJR147w A pattern from the most conserved part of the 
HSF DNA-binding domain was derived, its central region. 

[0833] Consensus pattern: L-x(3)-[FY]-K-H-x-N-x-(STAN]-S-F-[LI VM]-R-Q-L-[NH].x-Y-x-[FYWl-[RKH]-K-[LIVM]. 
[ 1] Sorger RK. Cell 65:363-366(1991). 

[ 2] Mager W.H., Moradas Fen-eira R Biochem. J. 290:1-13(1993). 

[ 3] Vuister G.W.. Kim S.^.. Orosz A.. Marquardt J,. Wu C, Bax A. Nat. Struct. Biol. 1:605-613(1994). 
[ 4] Fujita A.. Kikuchi Y, Kuhara S., Misumi Y. Matsumoto S., Kobayashi H. Gene 85:321-328(1989). 
[ 5] Morgan B.A., Bouquin N., Merrill G.R, Johnston LH. EMBO J. 14:5679-5689(1995). 

[0834] 281 . Heat shock hsp20 proteins family profile 

Prokaryotic and eukaryotic organisms respond to heat shock or other environmental stress by inducing the synthesis 
of proteins collectively known as heat-shock proteins (hsp) [1]. Amongst them is a family of proteins with an average 
molecular weight of 20 Kd. known as the hsp20 proteins [2 to 5J. These seem to act as chaperones that can protect 
other proteins against heat-induced denaturation and aggregation. Hsp20 proteins seem to form large heterooligomeric 
aggregates; theirfamily is currently composed of the following members: - Vertebrate heat shock protein hsp27 (hsp25) 
induced by a variety of environmental stresses. - Drosophila heat shock proteins hsp22. hsp23. hsp26, hsp27. hsp67BA 
and BC. - Caenorhabditis elegans hsp16 multigene family - Fungal HSP26 (budding yeast) and hsp30 (Neurospora 
crassa and Aspergillus Nidulans). - Plant small hsp's. Plants have four classes of hsp20: classes I and II which are 
cytoplasmic, class 111 whbh is chloroplastic and class IV which is found in the endomembrane. - Alpha^rystallin A and 
B chains. Alpha-crystal lin is an abundant constituent of the eye lens of most vertebrate species. Its main function 
appears to be to maintain the correct refractive index of the lens. It is also found in other tissues where it seems to act 
as a chaperone [6]. - Schistosoma nriansoni major egg antigen p40. Structurally p40 is built of two tandem hsp20 
domains. - A variety of prokaryotic proteins: ibpA and ibpB from Escherichia coli, hsp18 from Clostridium acetobutyli- 
cum. spore protein SP21 (hspA) from Stigmatella aurantiaca. Mycobacterium leprae 1 8 Kd antigen and Mycobacterium 
tuberculosis 14 Kd antigen. - Methanococcus jannaschii hypothetical protein MJ0285. Structurally, this family is char- 
acterized by the presence of a consen/ed C-terminal domain of about 100 residues. The profile developed to detect 
members of the hsp20 family is based on an alignment of this domain. 
[0835] -Sequences known to belong to this class detected by the profile: ALL. 

[ 1] Lindquist S.. Craig E.A. Annu. Rev Genet. 22:631 -677(1988).[ 2] de Jong W.W.. Leunissen J.A.M , Voorter C E 
M. Mol. Biol. Evol. 1 0:1 03-1 26(1 993). [ 3] Caspers G.J.. Leunissen J.A.M.. de Jong W.W. J Mol Evol 40-238-248 
(1995).[4] JaenickeR.. Creighton TE. Curr. Biol. 3234-235(1 993). [ 5] Jakob U.. Buchner J. Trends Biochem Sci 19' 
205-21 1(1994).[ 6] Groenen RJ.TA., Merck K.B., de Jong W.W., Bloemendal K Eur J. Biochem 225 1-9(1994) 
[0836] 282. Heat shock hsp70 proteins family signatures 

[0837] Prokaryotic and eukaryotic organisms respond to heat shock or other environmental stress by the induction 
of the synthesis of proteins collectively known as heat-shock proteins (hsp) [1]. Amongst them is a family of proteins 
with an average nnolecular weight of 70 Kd, known as the hsp70proteins [2.3.4]. In most species, there are many 
proteins that belong to the hsp70 family Some of them are expressed under unstressed conditions. Hsp70proteins 
can be found in different cellular compartments (nuclear, cytosolic. mitochondrial, endoplasmic reticulum etc ) Some 
of the hsp70 family proteinsare listed below: - In Escherichia coli and other bacteria, the main hsp70 protein is known 
as the dnaK protein. A second protein. hscA, has been recently discovered. dnaK is also found in the chloroplast 
genome of red algae. - In yeast, at least ten hsp70 proteins are known to exist: SSA1 to SSA4, SSB1, SS82. SSC1. 
SSD1 (KAR2), SSE1 (MSI 3) and SSE2. - In Drosophila, there are at least eight different hsp70 proteins: HSP70.' 
HSP68. and HSC-1 to HSC-6. - In mammals, there are at least eight different proteins: HSPA1 to HSPA6. HSC70, and 
GRP78 (also known as the immunoglobulin heavy chain binding protein (BiP)). - In the sugar beet yellow virus (S8YV) 
a hsp70 homotog has been shown [5] to exist. - In archaebacteria, hsp70 proteins are also present [6] All proteins 
belonging to the hsp70 family bind ATR A variety of f unctic^s has been postubted for hsp70 proteins. It now appears 
[7] that some hsp70proteins play an important role in the transport of proteins across membranes. They also seem to 
be involved in protein folding and in the assembly/disassembly of protein complexes [8]. Three signature patterns for 
the hsp70 family of proteins were derived; the first centered on a conserved pentapeptide found in the N-terminal 
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fn^iSr ^'^^'^ ^ ^^""^'^^^ ^^9'^^ ^^^^^ '^^ the central part of the sequence 

[08381 Consensus pattern: [I V]-D-L-G-T-tST]-x-[SCJ - ^ me sequence. 

Consensus pattern: [l-IVMF]4LIVMFYI.IDN]-[LIVMFS]^.[GSH]-[GSHAST].x(3).[S™u 
Consensus pattern: [LIVMY]-x^LIVMF]-x.G^.x-[ST]-x-[LIVM].p.x.[lll^^^^^ ^ 

[ 1] Lindquist S., Craig EA Annu. Rev. Genet. 22:631-677(1988) 
[ 2] Pelham H.R.B. Cell 46:959-961 ( iQfifi) 

[ 3J Pelham H.R.B. Nature 332:776-77(1 988). [ 4] Craig E A BbEssays 11:48-52(1989) 

I GuTpt" Karasev A.V.. Koonin E.V. Dolja V.V. J. Mol. Bbl. 217:603^10(1991) 

[ 6] Gupta R.S., Singh B. J. Bacteriol. 1 74:4594-4605(1 992). 
[ 7] Deshaies R.J., Koch B.D.. Schekmam R. Trends Biochem. Sci. 13-384-388(1988) 
[8] Craig E.A., Gross C.A. Trends Biochem. Sci. 16:135-140(1991). 

[0839] 283 Heat shock hsp90 proteins family signature 

Eschenchia coli and other bacteria heat shock protein c62.5 (gene htpG) - Vertebrate hsD 90 aloha /hTn S?T1h k 

[0840] Consensus pattem: Y-x-[NQH]-K-[DE]-[IVA]-F.L-R-[ED] - 

[ 1] Lindquist S., Craig E.A. Annu. Rev. Genet. 22:631-677(1988) 

[ 2] Nadeau K.. Das A., V\falsh C.T J. Biol. Chem. 268:1479-1487(1993). 

[ 3] Jakob U., Buchner J. Trends Biochem. Sci. 19:205-211(1994). 

[0841] 284. Hellx-turn-helix (HTH3) 

K 2 j Z^::^:^^ ^^'^ Cro S^,^ an^ C. Swiss:P03034. 

i?rrT"f ^ (EC1J4J9J) (HO) [1 ] is the microsomal enzyme that. In animals, carries out the oxidation of heme 
converted to bri.rubm by b.l^erd.n reductase. In mammate there are three isozymes of heme oxygenase HO-Ho HO 

[0844] Consensus pattem: L-[IV]-A-H-[STACH]-Y.(STV1-[RT1-Y-[LIVM]-G (H binds the heme] 

[ 1] Maines M.D. FASEB J. 2:2557-2568(1988). 
1 2] Barinaga M. Science 259:309-309(1993). 

1 3] Richaud C. Zabulon G. Proc. Natl. Acad. Sci. U.S.A. 94" 11 736-1 1741 (1997) 
[ 4] Schmitt M P. J. Bacteriol. 179:838-845(1 997). 

[0845] 286. Hepatitis core antigen. 

StpH^^r^ °' ^^f^""' "'""^^^ possesses a cartx,xyl terminus dch in arginine. On this basis it was 

K '.T nT^ ^ ^« experimental evidence to sup^rtS] 

S??7Q ^ I r T : "^""^^ ^'^^^"^^ "^^V P' Leadbetter G. IWurray K iJS 1 979-282- 

575-579. [2] Gallina A, Bonelli F. Zentilin L, Rindi G. Muttini M. Milanesi G; J Virol 1 989-63-4^4W2 
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[0848] 287. Histidine biosynthesis protein 

[0849] Proteins involved in steps 4 and 6 of the histidine biosynthesis pathway are contained in this family Histidine 
IS formed by several complex and distinct biochemical reactions catalysed by eight enzymes. The enzymes In this 
Ram entry are called His6 and His7 in eukaryotes and HisA and HisF in prokaryotes 

[fJ"^ ^^1^^' ^' ^- 1^°" Lazcano A, Lio P. Barberio C. Casalone E. Cavalieri D. Perito B, Polsinelli 

M. Gene 1997:197:9-17. [2] Fani R. Lio P. Chiarelli I. Bazzicalupo M. J Mol Evol 1994-38-489-495 
[0851] 288. Histone deacetylase family 

[0852] Histones can be reversibly acetylated on several lysine residues. Regulation of transcription is caused in part 
by this mechanism. Histone deacetylases catalyse the removal of the acetyl group. Histone deacetylases are related 
to other proteins [1 ). 

[0853] Leipe DD, Landsman D, Nucleic Acids Res 1997;25:3693-3697. 
[0854] 289. Histidinol dehydrogenase signature 

Histidinol dehydrogenase (ECUJJ3) (HDH) catalyzes the temiinal step in the biosynthesis of histidine in bacteria 
fungi, and plants, the four-electron oxidation of L-histidinol to histidine.ln bacteria HDH is a single chain polypeptide' 
in fungi it is the C-terminal domain of a multifunctional enzyme which catalyzes three different steps of histidine bio^ 
synthesis; and in plants it is expressed as nuclear encoded protein precursor which is exported to the chloropbst 111 
As a signature pattern a highly conserved region located in the central part of HDH was selected. This region does not 
correspond to the part of the enzyme that, in most, but not all HDH sequences contains a cysteine residue which in 
Salmonella typhimurium. has been said [2] to be important for the catalytic activity of the enzyme 
[KcSu^Mh"^^^^^^^ '-D-''(2)-A-G-P-[STl-E-[LIVSHLIVMA](3HAC]-x(3)-A-x(4)-[LIVM]-[AV]-ISACLHDE]- 

[ 1] Nagai A. Ward E.. Beck J., Tada S., Chang J.-Y. Scheidegger A. Ryals J. Proc. Natl. Acad. Sci. U.S.A. 88: 
41 33-41 37(1 991 ). 

[ 2] Grubmeyer C.T. Gray W.R. Biochemistry 25:4778-4784(1986). 
[0856] 290. Homoserine dehydrogenase signature 

Homoserine dehydrogenase (EC 1.1.1.3) (HDh) [1.2] catalyzes NAD-dependent reduction of aspartate beta-semial- 
dehyde into homoserine. This reaction is the third step in a pathway leading from aspartate to homoserine The latter 
participates in the biosynthesis of threonine and then isoleucine as well as in that of methionine. HDh is found either 
as a single chain protein as in some bacteria and yeast, or as a bif unctional enzyme consisting of an N-termlnal as- 
partokinase domain and a C-terminal HDh domain as in bacteria such as Escherichia coli and in plants. As a signature 
pattern, the best consented region of Hdh has been selected. This is a segment of 23 to 24 residues located in the 
central section and that contains two consented aspartate residues. 

[0857] Consensus pattern: A-x(3)-G-[LI ViVIFY]-[STAG]-x(2.3)-[DNS]-P-x(2)-D-[LIVIWI]-x-G- x-D-x{3)-K- 

[ 1] Thomas D.. Barbey R., Surdin-Kerjan Y. FEBS Lett. 323:289-293(1993). 
[ 2] Cami B., Clepet C, Patte J.-C. Blochimie 75:487-495(1993). 

[0858] 291 . haloacid dehalogenase-like hydrolase 

[0859] This family is structurally different from the alpha/ beta hydrolase family (abhydrolaseV This family includes 
L-2-haloacid dehalogenase. epoxide hydrolases and phosphatases. The structure of the family consists of two do- 
mains^ Oie IS an inserted four helix bundle, which is the least well conserved region of the alignment, between residues 
16 and 96of Swiss:P24069. The rest of the fold is composed of the core alpha/beta domain. [1] Hisano T. Rata Y Fujii 
T. Liu JQ, Kurihara T Esaki N. Soda K. J Biol Chem 1 996; 271 :20322-20330. 
[0860] 292. DEAD and DEAR box families ATP-dependent helicases signatures (helicase_C) 
A number of eukaryotic and prokaryotic proteins have been characterized [1 .2.3] on the basis of their structural simi- 
arity. They all seem to be involved in ATP-dependent. nucleic-acid unwinding. Proteins currently known to belong to 
this family are: - Initiation factor elF-4A. Found in eukaryotes. this protein is a subunit of a high molecular weiqht 
"oSr ^''^ recognition and the binding of mRNA to ribosomes. It is an ATP-dependent RNA-helicase 

- PHP5 and PRP28. These yeast proteins are involved in varfous ATP-requiring steps of the pre-mRNA splicing process 

- PI10. a mouse protein expressed specifically during spermatogenesis. - An3. a Xenopus putative RNA helicase' 
T^ll^al^ ' SPP81/DED1 and DBP1. two yeast proteins probably involved in pre-mRNA splicing and 

relatmJ to PI10. - Caenorhabditis elegans helicase glh-1. - MSS116. a yeast protein required for mitochondrial spfcing 
■ ^ '"""'"^'^ maturation of 25S ribosomal RNA. - p68, a human nuclear antigen. p68 has 

ATPase and DNA-helicase activities in vitro. It is involved in cell growth and division. - Rm62 (p62) a Drosoohila 
putative RNA helicase related to p58. - DBP2. a yeast protein related to p68. - DHH1 . a yeast protein - DRSI a yeast 
protein involved in ribosome assembly. - MAK5. a yeast protein involved in maintenance of dsRNA killer pl'asmid - 
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3rSl?r,L""'"' IT' ' °' '^'"■'""'^'"9 Protains-^ome other P«>teinfiCo a su5fa4 
have His instead of the second Asp and are thus said to be 'D-E-A-H-box" oroteins rn 5 6 Fi i Pr«t«in» 
to belong to this subfamily are: - PRP2 PRP16 PRP22and PRPr-^^i! . f'^^ Proteins currently known 

ATP-reouirlno-stPncrtf tho nr^-^DMA , These yeast proteins are all involved in various 

™ 2 . ^ ^ ^^"^^^^ repair o' DMA damaged by UV light bulky adducts or 

roi gfo?RrDT-^r^^^^^^^^ -'^^^P-'^'" XP? (S^^^^^^^^^^ 

u;rS~ 

putative RNA helicase. Signature patterns were developed for both subfamilies tscnerichia coli 

[0861] Consensus pattern: [LIVMF](2)-D-E-A-D-IRKENI-x-(LIVMFYGSTN]- 
Consensus pattern: [GSAH]-x-[LIVMF](3)-D-E-[ALIV]-H-[NECR] - 

eX <TDgcZ7 '"''°''"^ 

[ 1] Schmid S.R, Linder R Mol. Microbiol. 6*283-292(1992) 

[ 2] LhderR. LaskoR. AshbumerM., Leroy R. Nielsen PJ.. Nfehi K.. Schnier J.. SlonimskiRR Nature 337: 121 -122 

[ 3] Wassarman D.A., Steitz J.A. Nature 349:463-464(1991) 

[ 4] Hodgman T.C. Nature 333:22-23(1988) and Nature 333:578-578(1988) (Errata) 

[ 5] Harosh I., Deschavanne R Nucleic Acids Res. 19:6331 -6331 (1 991 ) 

[ 6] Koonin E. V, Senkevich T.G. J. Gen. Virol. 73:989-993(1 992). 

ESf 2 "^"'^-'"■"^'"9 dof"^'" i" cytochrome b5 and oxidoreductases (heme_1 ) 

[0863] Cytochrome b5 is a membrane-bound hemo protein which acts as an electron carrier for several memhran« 
rn"brrof mSI'-J'^T '""^ °' °- '^"^ micros^^^ o e to d "^e o" ; 

■ bX%':::i"crdX^^^^^^^ 

■ tSllXTr ^ "^^y ^"^^ in the first step of nitrate assimilatbn in plants fungi and 

TZ^ilS- T °' ' '"°^'^°P'^"" <PDOC00484», a heme-binding domain ^ led cWc^ 

chrome b557, as well as a cytochrome reductase domain ""mdin caiieo cyto- 

■ Sr^nTiS'lF^ "^'T degradation of sulfur.:on- 
taining ammo acids. Also consists of a molybdopterin domain and a heme-binding domain. 

This family of proteins also includes: 

- TU-36B. a Drosophila muscle protein of unknown functkin [6]. 

- Fission yeast hypothetical protein SpACi F1 2. 1 0c. 

- Yeast hypothetical protein YMR073c. 

- Yeast hypothetical protein YMR272c. 

[0864] A segment was used which includes the first of the two histidine heme ligands. as a signature pattern for the 
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heme-binding domain of cytochrome b5 family. 

[086S] Consensus pattem: [FY]-(UVMK]-x(2)-H-P-IGA]-G [H is a heme axial ligand]- 

[1] Ozols J. Biochim. Biophys. Acta 997:121-130(1989). 
[2] Guiard 8. EiWBO J. 4:3265-3272(1 985). 

cl^Z M H^Tn E - Vi"=snt2 M.. Rouze P.. Galangau F.. Vaucheret H., Cherei I., IVIeyer C. Kronenberger J 
CabocheM.Mol. Gen. Genet. 209:552-562(1987). "i>««iuBrgBr j., 

[4] Crawford N.M Smith M., Bellissimo D., Davte R.W. Proc. Natl. Acad. Sci. U.S.A. 85:5006-5010(1988) 
[5] Guiard B.. Lederer F. Eur. J. Biochem. 100:441-453(1979) ouiu(iyBB). 
[6] Levin R.J.. Boychuk PL.. Croniger CM.. Kazzaz J.A., Rozek C.E. Nucleic Acids Res. 17:6349-6367(1989). 

[0866] 294. Hexapeptide-repeat containing-transferases signature 

On me basis of sequence similarity, a number of transferases have been proposed [1 .2.3.4] to belong to a slnale famllv 
s^thesT A,Tk 'f"'T ^^^•V"^"^'^^^-- (EC Z3i30) (SAT) (gene c^E). in enzyrle invo^ei in cXe bS^ 
Srr ," T ^"'^^ chroococcum nHrogen fixation protein nifP NifP is most probab^/ a SAT involved ^Zop^T 
mization o nrtrogenase actwity. - Escherichia coli thiogalactoside acetyltransferase (EC 2 3116 > (a^r^I tecA an eJ 
zyme ,nvo^.ed ,r, the biosynthesis of tectose. - UDP-N-acetyiglucosamTne acyltr^JJsT^E^uleLl^Z' 
an enzyme involved ,n the biosynthesis of lipid A. a phosphorylated glycoliplj that ancho s thT^^saccha So 
lotr rt:T °' ■ UDP-30-(3-hydroxymyristoyl] glucosamine N-acyltransferase (EC 2T -wle 

2 3 1 28 ItroJfl T^'f °' "P'^* A- " Chloramphenicol acetyltraniferase (C/^'(EC 

2JJJ8) from Agrobactenum tumefaciens. Bacillus sphaericus. Escherichia coli plasmid IncFII NR79 Pseudomonas 

se?<P?ocoSl?X^^^^^^ ''''' ^ -^♦^^ .0 theTain flmT^^^^^ 

(see <P22COO0^>). - Rhizobium nodulation protein nodL, NodL is an acetyltransferase involved in the O-acetvlation 

tlst sTfEC 2^?ri7Woe^^^^^^^^^ " ^^'^''^ tetrahydrodipiconni^rsu^cLt^^ 

ITl (ECi^LHZ) (gene dapD) which catalyzes the fourth step in the biosynthesis of diaminoplmelate and 
ysine f om aspartate semialdehyde. - Bacterial N-acetylglucosamine-1-phosphate uridyHransfeir^r2 7 7 23^ 
(gene gImU or gcaD ortms), an enzyme involved in peptidoglycan and lipoUiccharirbir,^mesrs - Sta^SS 
ecus aureus protein capG which is involved in biosynthesis of type 1 uL polysicchaSe Yeast hJSS^ 

- Me hanococcus jannaschn hypothetical protein MJ1064.These proteins have been shown [3,4] to S a reoeat 

In shl^r'' ° 'T": 'T^'' °' " "^'^f'^''^' tertiary struire^CA [5? 

been^shown to form a left-handed parallel beta helix. Our signature pattern is based on a fourfold repeat Sf this liexa! 

K-,?2T?;^^^^^^^^ [GAED]-x(2).(STAVR]-x-[LIV,- 

[ 1] Downie J.A. Mol. Microbiol. 3:1649-1651(1989). 

[ 2] Parent R.. Roy PH. J. Bacteriol. 174:2891-2897(1992). 

[ 3] Vaara M. FEMS Microbiol. Lett. 97:249-254(1992). 

[ 4] Vuorio R.. Haerkonen T. Tolvanen M.. \feara M. FEBS Lett 337 289-292(1 994) 
[ 5] Raetz C.R.H., Roderick S.L Science 270:997-1000(1995) 

[086q 295. Hexokinases signature. Hexokinase (EC 2.7.1.n [1 .2] is an important glycolytic enzyme that catalv^es 
the phosphorytetion of keto- and aldohexoses (e.g. glucose, mannose and fructose) uU^T^as t,^e 
^^h i o^o isoenzymes, commonly referred as types I II. Ill and IV. Type rv hexSase 

TanT ? 'y.'^««'9"«t«* glucokinase [3], is only expressed in lh,er and pancreatb beta 'ells a^d 

Zm ^ K k'" '"f "t""^ " ^ °' « '"^ss o' about 50 Kd Hexok naTes oS 

I to III. which have low Km values for glucose, have a molecular mass of about 100 Kd. StruclurTthey cS.sL?^f a 
very small N-terrn,nal hydrophobic membrane-binding domain followed by two highly similar cSiZs of Is^^^^^ 
Thefirstdomainhaslostitscatalyticactivityandhasevolvedintoaregulatorydornah Inyeasme^areZedfflel^, 

molecular mass of about 50 Kd. All these enzymes contain one (or two in the case of types I to III isozvmestetronoiv 
consented region which has been shown [4] to be invoK^ed in substrate binding. A pattl f rom ILuSi betn 

Sci^l 282mQl'lfI^^ 

SCI. 16.281-282(1991).( 4] Schirch D.M., Wilson J.E. Arch. Biochem. Biophys. 254:385-396(1987). 
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[0871] 296. Histone H2A signature (hisi ) 

Histone H2A is one of the four histones. along with H2B, H3 and H4. which forms the eukaryotic nucleosome core 
Ustng alignments of histone H2Asequences [1 .2.E1] as a signature pattem, a consen/ed region in the N-terminal part 
of H2A. This region is consented both in classical S-phase regulated H2A's and in variant histone H2A's which are 
synthesized throughout the cell cycle. 
[0872] Consensus pattem: [AC]-G-L-x-F-P-V- 

[ 1] Wells D.E.. Brown D. Nucleic Acids Res. 19:2173-2188(1991). 

[ 2] Thatcher T.H., Gorovsky M.A. Nucleic Acids Res. 22:174-179(1994). 

[0873] Histone H4 signature (his2) 

[0874] Histone H4 is one of the four histones. along with H2A. H2B and H3. which forms the eukaryotic nucleosome 
core. Along with H3, it plays a central role in nucleosome formation. The sequence of histone H4 has remained almost 
invanant in more then 2 billion years of evolution [1 The region used as a signature pattem is a pentapeptide found 
in positions 14 to 18 of all H4sequences. It contains a lysine residue which is often acetylated [2] and a histidine residue 
whbh is implicated in DNA-binding [3]. 
[0875] Consensus pattem: G-A-K-R-H- 

[ 1] Thatcher TH., Gorovsky M.A. Nucleic Acids Res. 22:174-179(1994). 

[2] Doenecke D., Gallwitz D. Mol. Cell. Biochem. 44:113-128(1982). 

[ 3] Ebralidse K.K.. Grachev S.A., Mirzabekov A.D. Nature 331:365-367(1988). 

[0876] Histone H3 signatures (his3) 

Histone H3 is one of the four histones. along with H2A, H2B and H4. which forms the eukaryotic nucleosome core It 
IS a highly conserved protein of 135 amino acid residues [1.2^].The following proteins have been found to contain 
a C-terminal H3-iike domain: - Mammalian centromeric protein CENP-A [3]. Could act as a core histone necessary for 
the assembly of centromeres. - Yeast chromatin-associated protein CSE4 [4]. - Caenorhabditis elegans chromosome 
III encodes two highly related proteins (F54C8.2 and F58A4.3) whose C-terminal section is evolutionary related to the 
last 100 residues of H3. The function of these proteins is not yet known. Two signature patterns were developed The 
first one corresponds to a perfectly conserved heptapeptide in the N-terminal part of H3. The second one is derived 
from a conserved region in the central section of H3. 
[0877] Consensus pattem: K-A-P-R-K-Q-L- 
Consensus pattern: P-F-x-[RA]-L-[VA]-[KRQ]-[DEG]-[IV]- 

[ 1] Wells D.E., Brown D. Nucleic Acids Res. 19:2173-2188(1991). 

[ 2] Thatcher T.H., Gorovsky M.A. Nucleic Acids Res. 22:174-179(1994). 

[ 3] Sullivan K.F. Hechenberger M., Masri K, J. Cell Biol. 127:581-592(1994). 

[ 4] Stoler S., Keith K.C., Curnick K.E., Fitzgerald-Hayes M. Genes Dev 9:573-586(1995). 

[0878] Histone H2B signature (his4) 

[0879] Histone H2B is one of the four histones. along with H2A, H3 and H4. which forms the eukaryotic nucleosome 
core. Using alignments of histone H2Bsequences [1.2.E1]. a consented region was selected in the C-terminal part 
ofH2B. 

[0880] Consensus pattem: [KR]-E-[LIVM]-IEQ]-T-x(2)-[KR]-x-[LIVM](2).x-[PAGHDE]-L- x.[KR]-H-A-[LIVM]-[STA]- 
E-G- 

[ 1] Wells D.E., Brown D, Nucleic Acids Res. 19:2173-2188(1991). 

[ 2] Thatcher TH., Gorovsky M.A. Nucleic Acids Res. 22:174-179(1994). 

[0881] 297. 'Homeobox* domain signature and profile (homel ) 

The 'homeobox* is a protein domain of 60 amino ackis [1 to 5,E1J first identified in a number of Drosophila homeotic 
and segmentation proteins. It has since been found to be extremely well conserved in many other animals, including 
vertebrates. This domain binds DNA through a helix-tum-helix type of structure. Some of the proteins which contain a 
homeobox domain play an important role in development. Most of these proteins are known to be sequence specific 
DNA-binding transcription factors. The homeobox domain has also been found to be very similar to a region of the 
yeast mating type proteins. These are sequence-specific DNA-binding proteins that act as master switches in yeast 
differentiation by controlling gene expression in a cell type^pecific fashion, A schematic representatkjn of the home- 
obox domain is shown below. The helix-tum-helix region is shown by the symbols 'H' (for helix), and 1' (for turn). 
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xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxHUHHHHHHtttHHHHHHHHHxxxxxxxxxx 1 1 1 1 1 1 

zs.rz%':zTer;r^~ ^^^-^-^ ~ ^ spans 

1^1] Gehring W.J. (In) Guidebook to the homebox genes, Duboule D., Ed.. ppl-io. Oxford University Press. Oxford. 
[ 2] Buerglin T.R. (In) Guidebook to the homeboxgenes. Duboule D.. Ed.. pp25.72. Oxford University Press. Oxford. 

[ 3] Gehring W.J. Trends Biochem. ScL 17:277-280(1992). 

[ 4] Gehring W.J., Hiroml Y. Annu. Rev. Genet, 20:147-173(1986). 

[ 5] Schofield RN. Trends Neurosci. 10:3-6(1 987). 

[0883] 'Homeobox' antennapedia-type protein signature (home2) 

The homeotic Hox proteins are sequence-specific transcription factors. They are part of a developmental reoulatorv 

[0884J Consensus pattern: [LIVMFEHFY]-P-W-M-[KRQTA]- 

[ 1] McGinnis W., Krumlauf R. Cell 68:283-302(1 992^ 
[ 2] Scott M R Cell 71 .551 -553(1 flQP) 

[0885] 'Homeobox' engrailed-fype protein signature (home3) 

Sics^'^i^lfi^f f ■homeobox'domain can be classified [1 .2]. on the basis of their sequence char- 

subfamily are: - Drosophila segmentation polarity protein engrailed (en) which specifies the body seqr^entetton ^^^2 
and ,s required for the development of the central nervous system. - Drosophila invected^l Jn Z) sTmo^ 

aT^eri^^hot J;r^ compartmentaliLtk.n oCsiiCd H ne^ee E^ 

and E60. - Grasshopper (Schistocerca americana) G-En. - Mammalian and birds En-1 and En 2 - Zebrafish Enn 1 
L'h ?R V T:^'" (TriPneusteas gratilla) SU-HB-en. - Leech (Helobdella triser^lis) Ht En 2aenolbdi Sns 
°' Characterized by the presence of a consen/ed region o^eTa^n^^ 

rrsrpe^^^^^rer^^^^^^ 

[0887] Consensus pattern: L-M-A-[EQJ-G-L-Y-N- 

[ 1] Scott M.R, Tamkun J.W., Hartzell G.W. Ill Biochim. Biophys. Acta 989-25-48(1989) 
[2] Gehring W.J. Science 236: 1245- 1252(1 987). 

[0888] 298, Isocitrate lyase signature (ICL) 

St«Srf ""^i.''"^! '"'^"^^ °f *«°^"^ate to succinate and gVoxylate 

This IS the first step in the glyoxylate bypass, an altemative to the tricarboxylic acid cycle in bacteria f uToi and St!' 

Acysteine.ahistidineandaglutamateoraspartatehavebeentoundtobe7mportantforf^ 
Or^J-onecysteine residue is conserved between theseque^ 

f ^ °* ^ hexapeptide that can be used as a signature pattem for this type oren^rS' 

[0889] Consensus pattem: K-[KR]-C.G-H-[LMQJ [C is a putative active srte residue]- ^ 

[ 1] Beeching J.R. Protein Seq. Data Anal. 2:463^466(1989). 
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[ 2] Atomi H.. Ueda M.. Hikida M., Hishida T. Teranlshi Y. Tanaka A. J. Biochem. 107:262-266(1990). 
[0890] 299. Initiation factor 2 subunit 

[0891] This family Includes initiation factor 2B alpha, beta and delta subunils from eukaryotes, related proteins from 
archaebacteria and IF-2 from prokaryotes. Initiation factor 2 binds to Met-tRNA, GTP and the small ribosomal subunit 
[0892] [1] Kyrpldas NC. Woese CR. Proc Natl Acad Scl U S A 1 998;95:3726-3730 
[0893] 300. Initiation factor 3 signature 

Initiation factor 3 (IF-3) (gene infC) [1] Is one of the three factors required for the initiation of protein bfosynthesis in 
bacteria. IF-3 is thought to function as a fidelity factor during the assembly of the ternary initiatfon complex which consist 
of the 30S ribosomal subunit, the Initiator tRNA and the messenger RNA. IF-3 binds to the 30S ribosomal subunif it 
IS a basic protein of 141 to 212 residues. The chbroplast initiation factor IF-3(chl) is a protein that enhances the poly 
(A.U,G)-dependent binding of the initiator tRNA to chbroplast ribosomal30s subunits. In its mature form it is a protein 
of about 400 residues whose central section is evolutionary related to the sequence of bacterial IF-3 [2] As a signature 
pattern a highly conseived region was selected located In the central section of bacterial IF-3 and of IF-3(chl) 
[0894] Consensus pattern: [KR]-ILIVM](2)-IDNHFY]-(GSN]-(KR]-ILIVMFYS]-x-(FY]-IDEQTH]-x(2)-[KRQ]- 

[ 1] Liveris D.. Schwartz J. J., Geertman R., Schwartz I. FEMS Microbiol. Lett. 112:211-216(1993) 
[ 2] Lin Q.. Ma L, Burkhart W., Spremulli LL. J. Bral. Chem. 269:9436-9444(1994). 

[0895] 301 . Imidazoleglycerol-phosphate dehydratase signatures (IGPD) 

Imidazoleglycarol-phosphate dehydratase (EC 4.2.1.19 ) is the enzyme that catalyzes the seventh step In the biosyn- 
thesis of histidine in bacteria, fungi and plants. In most organisms it is a monofunctional protein of about 22 to29 Kd 
In some bactena such as Escherichia coll it is the C-termlnal domain of a bifunctional protein that include a histidinol- 
phosphatase domain [1]. Two signature patterns were developed that each include two consecutive histidine residues 
[089q Consensus pattern: [LIVMY]-[DE]-x-H-H-x(2)-E-x(2)-[GCA]-[LIVM]-[STAC]-[LIVM]- 
Consensus pattern: G-x-[DN]-x-H-H-x(2)-E-[STAGC]-x-(FY]-K - 

[0897] [ 1] Cariomagno M.S., ChiariottI L, Alifano R, Nappo A.G.. Bruni C.B. J. Mol. Btol. 203:585-606(1988) 
[0898] 302. lndole-3-glycerol phosphate synthase signature (IGPS) 

lndole-3-glycerol phosphate synthase (EC 4.1.1.48) (IGPS) catalyzes the fourth step in the bbsynthesis of tryptophan- 
the ring closure of 1 -(2-carboxy-phenylamino)-1 -deoxyribulose into indol-3-glycerol-phosphate. In some bacteria IGPS 
IS a single chain enzyme. In others - such as Escherichia coll - it is the N-temiinal domain of a bifunctional enzyme 
that also catalyzes N-(5"-phosphoribosyl)anthranilate isomerase (PRAI) activity, the third stepof tiyptophan biosynthe- 
sis. In fungi. IGPS is the central domain of a trifunctional enzyme that also contains a PRAI C-terminal domain and a 
glutamine amidotransferase N-terminal domain. The N-terminal section of IGPS contains a highly consented regfon 
which X-ray crystallography studies [1] have shown to be part of the active site cavity This region was used as a 
signature pattern for IGPS. 

[0899] Consensus pattern: [LIVMFY]-[LIVMC]-x-E-[LIVMFYC]-K-[KRSP]-[STAK]-S-P-[ST]-x(3)-[LIVMFYST]- 
[0900] [ 1] Wilmanns M., Priestle J.R, Niermann T. Jansonius J.N. J. Mol Biol 223477-507(1992) 
[0901] 303. (IL2)lnterleukin 2. 31 members 

[0902] 304. (ILVD EDD) Dihydroxy-acid and 6-phosphogluconate dehydratases. Two dehydratases have been 
shown [1] to be evolutionary related: - Dihydroxy-acld dehydratase (EC 4.2.1.9 ) (gene ilvD or ILV3) which catalyzes 
the fourth step in the biosynthesis of isoleucine and valine, the dehydratation of 2,3-dihydroxy-isovaleic acid into alpha- 
ketoisovaleric acid. - 6-phosphogluconate dehydratase (EC 4.2.1.12) (gene odd) which catalyzes the first step in the 
Entner-Doudoroft pathway, the dehydratation of 6-phospho-D-gluconate into 6-phospho-2-dehydro-3^eoxy-D-gluco- 
nate. - Escherichia coli hypothetical protein yjhG. Both enzymes are proteins of about 600 amino acid residues Two 
highly conserved regions have been developed as signature pattems. The first pattern is tocated in the N-terminal part 
and contains a cysteine that could be involved in the binding of a 2Fe-2S iron-sulfur cluster [21. The second pattern is 
located in the C-terminal half. h 
[0903] Consensus pattern: C-D-K-x(2)-P-[GA]-x(3)-[GA] [The C could be a 2Fe-2S ligand] 
Consensus pattern: (SA]-L-(UVM]-T-D-(GA]-R-[LIVMF]-S-[GA]-[GAV]-IST]- 

[0904] [ 1] Egan S.E., Fliege R., Tong S., Shibata A.. Wolf R.E. Jr.. Conway T J. BacterioL 174-4638-4646(1992) 
[ 2] Velasco J.A., Cansado J.. Pena M.C.. Kawakami T. Laborda J.. Notario V. Gene 137-179-185(1993) 
[0«)5] 305. IMP dehydrogenase / GMP reductase signature 

IMP dehydrogenase (EC 1.1.1.205) (IMPDH) catalyzes the rate-limiting reaction of de novo GTP biosynthesis the 
NAD-dependent reductton of IMP into XMP (IJ.Inhibition of IMP dehydrogenase activity results in the cessation of DNA 
synthesis. As IMP dehydrogenase is associated with cell proliferatbn. it is a possible target for cancer chemotherapy 
Mammalian and bacterial IMPDHs are tetramers of identical chains. There are two IMP dehydrogenase isozymes fri 
humans [2].GMP reductase (EC1.6.6.B) catalyzes the irreversible and NADPH-dependent reductive deaminatbn of 
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GMP into IMP f3J. It converts nucieobase. nucleoside and nucleotide derivatives of G to A nucleotides and maintains 
.ntracelMar balance ofAandGnucleotidesJMPdehydrogenaseandGMP reductase Share rnanyreg^^^^^^^^^ 

Consensus pattern: [LIVM]-[RK]-[LIVM]-G-[LIVM]-G-x-G-S-[LIVMJ-C-x-T (C is the putative IMP-binding resi- 



[ 1] Collart F.R., Huberman E. J. Btol. Chem. 263:15769-15772(1988) 

1 2] Natsume^ Y.. Ohno S.. Kawasaki H.. Konno Y, Weber G.. Suzuki K. J. Biol. Chem. 265:5292-5295(1990) 
I 3] Andrews S.C.. Guest J.R. Biochem. J. 255:35-43(1 988). 

[09071 306. (IPPc) Inositol polyphosphate phosphatase family, catalytic domain 
[0908] [1] York JD, Pcwider JW. Chen ZW, Mathews FS. Majerus PW 

1397,272.5983-5988. [3] Zhang X. Jefferson AS, Auethavekiat V. Majerus PW" Proc Natl Acad Sci U S A iqqi; oo- 
4853-4856. [4] York JD. Majerus PW. Proc Nat. Acad Sci U S A 1 990:87:9548-9552 ^jZ^^^AR York J^M^^^^^ 

FEBS Lett 1991;294:16-18. 

[0909] 307. IQ calmodulin-binding motif 

m^>Ge X. Harrison DH, Schlichting I. Sweet RM. Kalabokis VN, Szent-Gyorgyi AG, Cohen C; Nature 1994;368: 
[2] Rhoads AR, Friedberg F; FASEB J 1997;11:331-340. 

[0910] 308. Inosine-uridine preferring nucleoside hydrolasefamily signature (lU nuc hydro) 

nosine-uridine preferring nucleoside hydrolase (EC 32JJ) (lU-nucleosidehydrolase or lUNH) is an enzyme first iden- 

mori^^r^rdm^rj o' all of the commonly occuring purine and pyrimidi^e nucT^tdes 

ntoriboseandtheassocatedbase,buthasapreferenceforinosineanduridineassubstrates.Th^^^ 

of ^ Kd An h S K J r '"^"'"'"^ ^"'^ Characterized, it is an homotetrameric enzyme of subunits 
of 34 Kd. An h stid ne has been shown to be important for the catalytic mechanism, it acts a proton donor to acth/ate 
lllLTT '^^9 'UNH evolutionary related to a number of uncharacterized protein^fZ var^ s 

biol^,cal sources notably: - Escherichia coli hypothetical protein yaaR - Escherichia coli hypoLical protein ybeK 
' ? vncl!.'.°" ^yP°'^°*«=^' P~*«'" - Fission yeast hypothetical protein SpACUGaS? - Yeast hypZtical 

or these proteins, a highly conserved region was selected located in the N-tem,inal extremity. This region c«itS 
four consented aspartates that have been shown [2] to be located in the active site cavity. 
[0911] Consensus pattem: D-x-D-[PT]-IGA]-x-D-D-[TAV)-[VI]-A - 

I o! S.L.. Degano M.. Sacchettini J.C.. Schramm V.L. Biochemistry 35:5963-5970(1996) 

[ 2] Degano M.. Gopaul D.N.. Scapin G., Schramm V.L. Sacchettini J.C. Biochemistry 35:5971-5981(1996) 

[0912] 309. (Insulinase) 

Insulinase family, zinc-binding region signature 

(aka Peptidase_M16) 

[0913] A number of proteases dependent on divalent cations for their activity have been shown [1 2] to belona to 
one family, on the basis of sequence similarity These enzymes are listed betow ^ 

■ ? V'^""-^^ i^^^o^ as insuVsin or insulin^egrading enzyme or IDE), a cytoplasmic enzyme 
which seems to be involved in the cellular processing of insulin, glucagon and other small polypeptides 

■ ^Ss "' ^^"^ ^ ''-^''■^^^ ^^""'^"'"^ ^ P^riplasmic enzyme thaVdegrades small pep- 

" c'!.S^^tl'''T""^ ""'"^^ ™' ^"^'"^ ^^'""^^ ''^^W P«P«de from the pre- 

cursor fomr, of proteins imported from the cytoplasm across the mitochondrial inner membrane. It is composed of 
^ra nonidentcal homologous subunits temied alpha and beta. The beta subunit seems to be catalytical^ctive 
while the alpha subunit has probably lost Hs activity. ^>^tyuca»y aaive 

Nardilysin (EC 3.4.24.61) (N-arginine dibasic convertase or NRD convertase) this mammalian enzyme cleaves 
peptide substrates on the N-temiinus of Arg residues in dibasfc stretches. 
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- Klebsiella pneumoniae protein pqqR This protein Is required for the biosynthesis of the coenzyme pyrroloKiuino- 
line-quinone (PQQ). It is thought to be protease that cleaves peptide bonds in a small peptide (gene pqqA) thus 
providing the glutamate and tyrosine residues necessary for the synthesis of PQQ. 

- Yeast protein AXL1 . which is involved in axial budding [3]. 
Eimeria bovis sporozoite developmental protein. 

- Escherichia coli hypothetical protein yddC and HI1 368. the corresponding Haemophilus influenzae protein 

- Bacillus subtilis hypothetical protein ymxG. 

- Caenorhabditis elegans hypothetical proteins C28F5.4 and F56D2. 1 . 

[0914] It should be noted that in addition to the above enzymes, this family also includes the core proteins I and II 
of the mitochondrial bc1 complex (also called cytochrome c reductase or complex III), but the situation as to the activity 
or lack of activity of these subunits is quite complex: 

- In mammals and yeast, core proteins I and II lack enzymatic activity 

- In Neurospora crassa and in potato core protein I is equivalent to the beta subunit of MPR 

- In Euglena gracilis, core protein I seems to be active, while subunit II is inactive. 

[0915] These proteins do not share many regions of sequence similarity; the most noticeable is in the N-terminal 
section. This region includes a consented histidine followed, two residues later by a glutamate and another histidine 
In prtrilysin. it has been shown [4] that this H-x-x-E-H motif is involved in enzyme activity; the two histidines bind zinc 
and the glutamate is necessary for catalytic activity Non active members of this family have lost from one to three of 
these active site residues. We developed a signature pattern that detect active members of this family as well as some 
inactive members. 

f^K-w^oTrt"^ ^^^^'"^ G-^(8.9)-G->^-[STA]-H-[LIVMFYHLIVMC]-[DERN]-[HRKLJ-[LMFAT].x-[LFSTH^ 
[GSTAN]-[GST] [The two H are zinc ligands] [E is the active site residue] Sequences known to belong to this class 
detected by the pattern ALL active members as well as all MPP alpha subunits and core II subunits Does not detect 
inactive core I subunits. 

[0917] Note: these proteins belong to family M1 6 in the classification of peptidases [5]. 

[ 1] Flawlings N.D.. Barrett A.J. Biochem. J. 275:389-391(1991). 

[ 2] Braun H.-R, Schmitz U.K. Trends Biochem. Sci. 20:171-175(1995). 

[ 3] Becker A.B.. Roth R.A. Proc. Natl. Acad. Sci. U.S.A. 89:3835-3839(1992). 

[ 4] Fujita A.. Oka C. Arikawa Y. Katagai T. Tonouchi A.. Kuhara S., Misumi Y Nature 372-567-570(1994) 
[ 5] Rawlings N.D., Barrett A.J. Meth. Enzymol. 248:183-228(1995). 

[0918] 310. Involucrin repeat 

[0919] Eckert RL. Yaffe MB, Crish JF, Murthy S, Rorke EA. Welter JF. J Invest Dermatol 1993;100:613-617 
[0920] 31 1 . Isochorismatase family This family are hydrolase enzymes. 

[Oa>1] Romao MJ, Turk D. Gomis-Ruth FX. Huber R. Schumacher G, Mollering H. Russmann L. J Mol Biol 1992* 
226.1111-1130. ' 
[0922] 312. Inositol monophosphatase family signatures (inosltol_P) 

It has been shown [1] that several proteins share two sequence motifs. Two of these proteins are enzymes of the 
inosrtol phosphate second messenger signaling pathway: - Vertebrate and plants inositol monophosphatase (EC 
SJJJSLj Vertebrate inositol polyphosphate 1 -phosphatase (EC 3.1.3.57 VThe function of the other proteins is not 
yet clear. - Bactenal protein cysQ. CysQ could help to control the pool of PAPS (3'-phosphoadenoside 5'-phosphosul- 
fate), or be useful in sulfite synthesis. - Escherichia coli protein suhB. Mutattons In suhB results in the enhanced syn- 
thesis of heat shock sigma factor (htpR). - Neurospora crassa protein Qa-X. Probably involved in quinate metabolism 
- Emericella nidulans protein qutG. Probably involved in quinate metabolism. - Yeast protein HAL2/MET22 [21 involved 
in salt tolerance as well as methionine biosynthesis. - Yeast hypothetical hypothetical protein YHR046c. - Caenorhab- 
ditis elegans hypothetical protein F13G3.5. - A Rhizobium leguminosarum hypothetical protein encoded upstream of 
he pss gene for exopolysaccharide synthesis. - Methanococcus jannaschii hypothetical protein MJ01 09.lt is suggested 
[1] that these proteins may act by enhancing the synthesis or degradation of phosphoiylated messenger molecules 
From the X-ray structure of human inositol monophosphatase [3], it seems that some of the conserved residues are 
involved in binding a metal ion and/or the phosphate group of the substrate. 
[0923] Consensus pattern: [FWV]-x(0, 1 )-[LIVMJ-D-P-[U \flw1]-[>-[SG]-[S"n-x(2)-rFY1-x- 
[HKRNSTY] [The first D and the T bind a metal »n]- 

Consensus pattern: [WV]-D-x-[ACHGSA]-(GSAPV]-x-tLIVACPHLIV]-[LIVAC]-x(3)-(GH]-[GAJ- 
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[ 1] Neuwald A.F., York J.D., Majerus RW. FEBS Lett 294-16-18{199l) 

Sfgg^r'' H-U. Thomas D.. Gaxio^ R.. Montrichard P.. Surdin-Kerjan Y.. Serrano R. EMBO J. 12:3105-3110 
[ 31 Bone R.. Springer J.R. Atack J.R. Proc. Natl. Acad. ScL U.S.A 89:10031-10035(1992). 
[0924] 31 a Ion transport protein 

me lasi wo neiices flank a loop which determines on selectivity. In some sub-families (e a Na channak^ th« rinmai,> 
[092q 31 4. Isocitrate and isopropylmalate dehydrogenases signature (isodh) 

mm [3,4] catalyzes the third step in the biosynthesis of leucine in bactera and fungi, the oxStive deSrbSS 
of 3-isopropylmalate into2-oxo^.methylvalerate. Tartrate dehydrogenase fEC 1 1 1 93U^^Z^.^1,^^. . 
tartrate to oxaloglycolate. These enzyrnes are evolmiona^reCd nT4 If T^f^iS« °' 
^esisag^cine-r^hstretchofresilslocatedintheSn"^^^^^^^^^^ 
[Sa,S~]/'"'"^ f^«HL.MYT,-[FYDN].G-[DNT]-nMVY]-x-[STGDNW^^^^^ 

Li!. u"S 86;86'Si^^^^^^^ "^'"^^^ ^♦-'^ Nat.. Acad. 

[ 2] Cupp J.R., McAlister-Henn L. J. Biol. Chem. 266-22199-22205(1991) 

[ 3] irnada K.. Sato M Tanaka N.. Katsube Y. Matsuura Y, Oshima T J. Mol. Biol. 222:725-738(1991 ) 
[ 4] Zhang T, Koshland D.E. Jr Protein Sci. 4:84-92(1 9g5). /^uyyi). 
[ 5] Tipton PA., Beecher B.S. Arch. Biochem. Btophys. 313:15-21(1994). 

[0928] 315. Jacalin-like lectin donnain. 

[0929] Proteins containing this domain are lectins. It is found In 1 to 6 cooies in these nrntoinc Th« ^ • • . 
found in the animal prostatic spermine-binding protein i^s^A^J^) ^ ^^'^^'^ 
S'-S)3.^'^ ^-"ka^anarayanan R, Sekar K. Banerjee R. Sharma V, Surolia A. N^jayan M; Nat Struct Biol 1996;3: 
[0931] 316. KH domain 

SonuTaS? '^"^^ ^"'"^'^^ '° ^ P-«-n. -use paraneoplastic 

[1] Burd CG, Dreyfuss G, Science 1994;265:6 15^21 

[2] Musco G. Stier G. Joseph C. Gastiglione Morelli MA. Nilges M. Gibson TJ. Pastore A. Cell 1996:85:237-245. 
[0933] 317. Kelch motif 

[0934] The kelch motif was initially discovered in Kelch fSwiss Onafi*;^)^ in tKie rxr^t^:« 

[2]^e kelch motif forms a beta sheet. Several of these sheets associate to form a beta propeller "tmcture asTouS 

[0935] [IJBorkP.DoolittleRF, J Mol Biol 1994;236:1277-1282 f21ltoN PhilliosSP «5.fl«»neP o«<,i7o nu 
MJ. Keen, JN. Yadav KD. Knowles PR Nature 1991;350:87 90 Pf"'"PS SE, Stevens C. Ogel ZB, McPherson 

f^2 ^"^^^ "VPS*" '"f^'bttor (Kunitz) protease inhibitors family signature 
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whrch inhibit both cathepsin D (aspartic proteinase) and trypsin. - Alpha^mylase/subtiiisin inhibitors from barley and 
wheat. - AIbumin-1 (WBA-1 ) from goa bean seeds [3], - Miraculin from Richadelia dulcrfica [4], a sweet taste protein. 
- Sporamin from sweet potato [5J. the major tuberous root protein. - Thiol proteinase inhibitor PCPI 8.3 {P340) from 
potato tuber [6]. - Wound responsive protein gwinS from poplar tree [7]. - 21 Kd seed protein from cocoa [8] All these 
proteins contain from 170 to 200 amino acid residues and one or twointrachain disulfide bonds. The best consen/ed 
region is found In their N-terminal section and is used as a signature pattern 
[0938] Consensus pattern: [LIVM]-x-D-x-[EDNTY]-[DG]-(RKHDENQ]-x-[LIVM]-x(5)-Y.x-[LIVM] - 

[ 1] Laskowski M., Kato I. Annu. Rev. Biochem. 49:593-626(1980). 

[ 2] RItonja A.. Krizaj I., Mesko R, Kopitar M., Lucovnik R, Strukelj B., Pungercar J.. Buttle D J Barrett A J Turk 
V. FEBS Lett. 267:13-15(1990). 

[ 3] Kortt A.A„ Strike PM.. de Jersey J. Eur. J. Biochem. 181:403-408(1989). 

[4] Theerasilp S.. Hitotsuya H., Nakajo S,. Nakaja K.. Nakamura Y. Kurihara Y J. Biol. Chem. 264:6655-6659 
(1 989). 

[ 5] Hattori T, Yoshida N., Nakamura K. Plant Mol. Biol. 13:563-572(1989). 

[ 6] Krizaj I.. Drobnic-Kosorok M., Brzin J., Jerala R., Turk V. FEBS Lett. 333:15-20(1993). 

[ 7] Bradshaw H.D.. Hollick J.B., Parsons T.J., Clarke H.R.G.. Gordon M.P Plant Mol. Biol. 14-51-59(1989) 

[ 8] Tai H., McHenry L, Fritz RJ., Furtek D.B. Plant Mol. Biol. 16:913-915(1991). 

[0939] 319. Beta-ketoacyl synthases active site 

Beta-ketoacyl-ACP synthase (KAS) [1] is the enzyme that catalyzes the condensatbn of malonyl-ACP with the growing 
fatty acid chain. It Is found as a component of the following enzymatic systems: - Fatty acid synthetase (FAS) which 
catalyzes the formatfon of long-chain fatty acids from acetyl-CoA. malonyl-CoA and NADPH. Bacterial and plant chfo- 
roplast FAS are composed of eight separate subunits which correspond to different enzymatic activities beta-ketoacyl 
synthase is one of these polypeptides. Fungal FAS cwisists of two multifunctional proteins, FAS1 and FAS2- the beta- 
ketoacyl synthase domain is located in the C-terminal section of FAS2. Vertebrate FAS consists of a single multifunc- 
tional Cham; the beta-ketoacyl synthase domain is located in the N-temninal section [2]. - The multifunctional 6-meth- 
ysalicylic acid synthase (MSAS) from Penicillium patulum [3]. This is a multifunctional enzyme Involved in the biosyn- 
thesis of a polyketide antibiotic and which has a KAS domain in its N-temiinal section. - Polyketide antibiotic synthase 
enzyme systems. Polyketides are secondary metabolites produced by microorganisms and plants from simple fatty 
acids. KAS IS one of the components involved in the biosynthesis of the Streptomyces polyketkJe antibiotics granatacin 
[4], tetracenomycin C [5] and erythromycin. - Emericella nidulans multifunctional protein Vte Wa is involved in the 
biosynthesis of conidial green pigment. \Na is protein of 216 Kd that contains a KAS domain. - Rhizobium nodulation 
protein nodE, which probably acts as a beta-ketoacyl synthase in the synthesis of the nodulation Nod factor fatty acyl 
Cham. - Yeast mitochondrial protein CEM1. The condensation reaction is a two step process: the acyl component of 
an activated acyl primer is transferred to a cysteine residue of the enzyme and is then condensed with an activated 
malonyl donor with the concomitant release of carbon dioxide. The sequence around the active site cysteine is well 
consented and can be used as a signature pattern. 

[0940] Consensus pattern: G-x(4)-[LIVMFAP]-x(2)-[AGC]-C-[STA](2)-[STAG]-x(3)-[LIVMF] [C is the active site resi- 
due] 



( 1] Kauppmen S.. Siggaard-Andersen M., von Wettstein-Knowles R Carlsberg Res. Commun. 53' 357-370(1 988) 

[ 2] Witkowski A.. Rangan VS., Randhawa Z.I., Amy CM.. Smith S. Eur. J. Biochem. 198:571-579(1991) 

( 3] Beck J., Ripka S., Siegner A., Schiltz E., Schweizer E. Eur. J. Biochem. 192:487-498(1990). 

[4] Bibb M.J., Biro S., Motamedi H., Collins J.R, Hutchinson C.R. EMBO J. 8:2727-2736(1989). 

[ 5] Shemian D.H., Malpartida R. Bibb M.J., Kleser H.M., Bibb M.J.. Hopwood D.A. EMBO J. 8:2717-2725(1989). 

[0941] 320. Kinesin motor domain signature and profile 

Kinesin [1,2,3] is a microtubule-associated force-producing protein that mayplay a role in organelle transport Kinesin 
IS an oligomeric complex composedof two heavy chains and tvra light chains. The kinesin motor activity isdirected 
toward the microtubule's plus end.The heavy chain is composed of three structural domains: a large globular N-terminal 
domain which is responsible for the motor activity of kinesin (it isknown to hydrolyze ATR to bind and move on micro- 
tubules), a central alpha-helical coiled coil domain that mediates the heavy chain dimerization; and asmall globular C- 
terminal domam which interacts with other proteins (such asthe kinesin light chains), vesicles and membranous or- 
ganelles A number of proteins have been recently found that contain a domain similarto that of the kinesin 'motor- 
domam [1.4,E1]: - Drosophlla claret segregational protein (ncd). Ned is required for normal chromosomal segregalkMi 
m meiosis, in females, and in early mitotic divisions of the embryo. The ncd motor activity is directed toward the mi- 
crotubule's minus end. - Drosophlla kinesin-like protein (nod). Nod is required for the distributive chromosome segre- 
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gation of nonexchange chromosomes during meiosis. - Human CENP-E [4]. CENP-E is a protein that associates with 
kinetochores during chromosome congression. relocates to the spindle midzone at anaphase, and is quantitatively 
discarded at the end of the cell division. CENP-E is probably an important motor molecule in chromosome movement 
and/ or spindle elongation. - Human mitotic kinesin-like protein-1 (MKLP-1), a motor protein whose activity is directed 

5 toward the microtubule's plus end. - Yeast KAR3 protein, which is essential for yeast nuclear fusion during mating. 
KAR3 may mediate microtubule sliding during nuclear fusion and possibly mitosis. - Yeast CIN8 and KIP1 proteins 
which are required for the assembly of the mitotic spindle. Both proteins seem to interact with spindle microtubules to 
produce an outwardly directed force acting upon the poles. - Fission yeast cut? protein, which is essential for spindle 
body duplication during mitotic division, - Emericella nidulans bimC, which plays an important role in nuclear division. 

10 • Emericella nidulans kIpA. - Caenorhabditis elegans unc-104, which may be required for the transport of substances 
needed for neuronal cell differentiation. - Caenorhabditis elegans osm-3. - Xenopus Eg5, which may be involved in 
mitosis. - Arabidopsis thaliana KatA. KatB and katC. - Chlamydomonas reinhardtii FLA10/KHP1 and KLP1 Both pro- 
teins seem to play a role in the rotation or twisting of the microtubules of the flagella. - Caenorhabditis elegans hypo- 
thetical protein T09A5.2.The kinesin motor domain is located in the N4erminal part of most of theabove proteins, with 

IS the exception of KAR3. klpA. and ncd where it is locatedin the C-terminal section.The kinesin motor domain contains 
about 330 amino acids. An ATP-binding motif of type A is found near position 80 to 90, the C-terminal half of the domainis 
involved in microtubule-binding. The signature pattern for that domain isderived from a conserved decapeptide inside 
the microtubule-binding part. 

Consensus pattern: [GSA]-[KRHPSTQVfy/IHUVMF]-x-[LIVMF]-[IVC]-D-L-(AH]-G-[SAN]-E 

20 

[ 1] Bloom G.S., Endow S.A. Protein Prof. 2:1109-1171(1995). 

[ 2] Vallee R.B., Shpetner H.S. Annu. Rev. Biochem, 59:909-932(1990). 

[ 3] Brady S.T Trends Cell Biol. 5:159-164(1995). 

[ 4] Endow S.A. Trends Biochem. Sci. 16:221 -225(1 991 ).[E1] 

25 

[0942] 321 . Ribosomal protein LI 5 signature 

Ribosomal protein LI 5 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L15 is known to bind 
the 23S rRNA. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities [1], groups: - 
Eubacterial LI 5. - Plant chloroplast LI 5 (nuclear-encoded). - Archaebacterial LI 5. - Vertebrate L27a. - Tetrahymena 
30 thermophila L29. - Fungi L27a (L29, CRP-1 , CYH2).L15 is a protein of 144 to 1 54 amino-acid residues. As a signature 
pattern, a consen/ed region was selected in the C-terminal section of these proteins. 

[0943] Consensus pattem:K-[LlVM](2)-[GASL]-x-[GT]-x-[LlVMA]-x(2.5)-[LIVM].x-[LIVMF]-x(3,4)-[LlVMFCA^ 
(2)-A-x(3)-[LIVMl-x(3)-G 

[0944] [ IJOtaka E., HashimotoT, Mizuta K., Suzuki K. Protein Seq. Data Anal. 5:301-313(1993). 

35 [0945] 322. LBP / BPl / CETP family signature 

The following mammalian lipid-binding serum glycoproteins belong to the same family [1,2,3]: - Lipopolysaccharide- 
binding protein (LBP). LBP binds to the lipid A moiety of bacterial lipopolysaccharides (LPS), a glycolipid present in 
the outer membrane of all Gram-negative bacteria. The LBP/LPS complex seems to interact with the CD14 receptor 
and may be responsible for the secretion of alpha-TNF - Bactericidal permeability-increasing protein (BPl). Like LBP. 

40 BPl binds LPS and has a cytotoxic activity on Gram-negative bacteria. - Cholesteryl ester transfer protein (CETP). 
CETP is involved in the transfer of insoluble cholesteryl esters in reverse cholesterol transport. - Phospholipid transfer 
protein (PLTP). May play a key role in extracellular phospholipid transport and modulation of HDL particles. These 
proteins are structurally related and share many regions of sequencesimilarities. As a signature pattern one of these 
regions was selected, which is located in the N-terminal section of these proteins; a region which could be involved in 

45 the binding to the lipids [2]. 

Consensus pattern: [PA]-[GAHLIVMC]-x(2)-R-[lV]-[ST]-x(3)-L-x(5)-[EQl-x(4)-[LIVM]-[EQK]-x(8)-P 

[ 1] Schumann R.R., Leong S.R.. Flaggs G.W.. Gray PW.. Wright S.D., Mathison J.C.. Tobias RS.. Ulevitch R.J. 
Science 249:1429-1431(1990). 

so I 2] Gray RW.. Flaggs G., Leong S.R., Gumina R.J.. Weiss J., Ooi C.E., Elsbach R J. BioL Chem. 264:9505-9509 

(1989). 

[ 3] Day J.R., Albers J.J.. Lofton-Day C.E., Gilbert T.L, Ching A,RT. Grant FJ., O'Hara P.J.. Marcovina S.M.. 
Adolphson J.L J. BioL Chem. 269:9388-9391(1994). 

55 [0946] 323. LIM domain signature and profile 

Recently [1 ,2] a number of proteins have been found to contain a consen/ed cysteine-rich ctomain of about 60 amino- 
acid residues. These proteins are: - Caenorhabditis elegans mec-3; a protein required for the differentiation of the set 
of six touch receptor neurons in this nematode. - Caenorhabditis elegans lin-11; a protein required for the asymmetric 
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division of vulval blast cells. - Vertebrate insulin gene enhancer binding protein isl-1 . lsl-1 binds to one of the two cis- 
acting protein-binding domains of the insulin gene. - Vertebrate honneobox proteins lim-1 , lim-2 (lim-5) and Iim3. - 
Vertebrate lmx-1 , which acts as a transcriptional activator by binding to the FLAT element; a beta-cell-specific tran- 
scriptional enhancer found In the insulin gene. - Mammalian LH-2. a transcriptional regulatory protein involved in the 

5 control of cell differentiation in developing lymphoid and neural cell types. - Drosophila protein apterous, required for 
the normal development of the wing and hatter imaginal discs. - Vertebrate protein kinases LIMK-1 and LIMK-2. - 
Mammalian rhombotins. Rhombotin 1 (RBTN1 or TTG-1) and rhombotin-2 (RBTN2 or TTG-2) are proteins of about 
1 60 amino acids whose genes are disrupted by chromosomal translocatbns in T-cell leukemia. - Mammalian and avian 
cysteine-rich protein (CRP), a 1 92 amino-acid protein of unknown function. Seems to interact with zyxin. - Mammalian 

10 cysteine-rich intestinal protein (CRIP), a small protein which seems to have a role In zinc absorption and may function 
as an intracellular zinc transport protein. - Vertebrate paxillin, a cytoskeletal focal adhesion protein. - Mouse testin. 
Mouse testin should not be confused with rat testin which Is a thiol protease homolog. - Sunflower pollen specific protein 
SF3. - Chicken zyxin. Zyxin Is a low-abundance adhesion plaque protein which has been shown to interact with CRP. 
- Yeast protein LRG1 which Is involved in spomlation [4]. - Yeast rho-type GTPase activating protein RGA1/DBM1. - 

15 Caenorhabditis elegans homeobox protein ceh-14. - Caenorhabditis elegans homeobox protein unc-97. - Yeast hypo- 
thetical protein YKR090w. - Caenorhabditis elegans hypothetical proteins C28H8.6.These proteins generally have two 
tandem copies of a domain, called LIM (forLin-11 lsl-1 Mec-3) in their N-terminal section. Zyxin and paxillin areexcep- 
tions in that they contains respectively three and four LIM domains attheir C-termina! extremity. In apterous, lsl-1 , LH- 
2, lin-11, lim-1 to lim-3,lmx-1 and ceh-14 and mec-3 there is a homeobox domain some 50 to 95 amino acids after 

20 theLIM domains.ln the LIM domain, there are seven conserved cysteine residues and ahistidine. The arrangement 
followed by these conserved residues Is C-x(2)-C-x{16,23)-H-x(2)-[CH]-x(2)-C-x(2)-C-x(16,21)-C-x(2.3)-[CHD]. The 
LIM domalnbinds two zinc bns [5]. LIM does not bind DNA, rather it seems to act asinterface for protein-protein inter- 
action, A pattern was developed that spans the first half of the LIM domain. 

[09471 Consensus pattem: C-x(2)-C-x(1 5.21 )-[FYWH]-H-x(2)-[CH]-x(2)-C-x(2)-C-x(3)-[LI VMF] [The 5 C's and the H 
25 bind zinc] 

[ 1] Freyd G., Kim S.K.. Horvitz H.R. Nature 344:876-879(1990). 
[ 2] Baltz R., Evrard J.-L., Domon C, Steinmetz A. Plant Cell 4:1465-1466(1992). 
[ 3] Sanchez-Garcia I., Rabbitts TH. Trends Genet. 10:315-320(1994). 
30 [ 4] Mueller A., Xu G., Wells R., Hollenberg CP. Plepersberg W. Nucleic Acids Res. 22:3151-3154(1994). 

[5] Michelsen J.W., Schmeichel K.L. Beckerle M.C., Winge D.R. Proc. Natl. Acad. Sci. U.S.A. 90:4404-4408 
(1993). 

[0948] 324. (LRR) Leucine Rich Repeat 

35 CAUTION: This Ram may not find all Leucine Rich Repeats in a protein. Leucine Rich Repeats are short sequence 
motifs present in a number of proteins with diverse functions and cellular locations. These repeats are usually involved 
in protein-protein interactions. Each Leucine Rich Repeat is composed of a beta-alpha unit. These units form elongated 
non-globular structures. Leucine Rich Repeats are often flanked by cysteine rich domains. Number of members: 3017 
[1] The teucine-rich repeat: a versatile binding motif. Kobe B, Delsenhofer J; Trends Biochem Sci 1994;19:415-421. 

40 [2] Crystal structure of porcine ribonuclease inhibitor, a protein with leucine-rich repeats. Kobe B, Deisenhofer J; Nature 
1993;366:751-756. 

[0949] 325. Plant lipid transfer protein family signature (LTP) 

[0950] Plant cells contain proteins, called lipid transfer proteins (LTP) [1 ,2,3], which are able to facilitate the transfer 
of phospholipids and other lipidsacross membranes. These proteins, whose subcellular location is not yet known, couW 
45 play a major role in membrane biogenesis by conveying phospholipids such as waxes or cutin from their site of bio- 
synthesis to membranes unable to form these lipids. Plant LTP's are proteins of about 9 Kd (90 amino acids) which 
contain eight conserved cysteine residues all involved in disulfide bridges, as shown in the following schematic repre- 
sentation. 

+ + I + + 1 1 1 1 1 

xCxxxxCxxxxxxCCxxxxxxxxCxCxxxxxxxxxxxCxxxxxxCxx 1 1 1 1 | ~+ | +• 

55 

C: conserved cysteine involved in a disulfide bond, 
position of the pattem. 
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[1] Wirtz K.WA Annu. Rev. Biochem. 60:73-99(1991). 
[2] Arondel V.. Kader J.C. Experientia 46:579-585(1990) 

[3] Ohirogge J.B.. Browse J.. Somerville C.R. Biochim. Biophys. Acta 1082:1-26(1991). 
[09S2] 326. (LAMP) Lysosome^ssociated membrane glycoproteins signatures 

disulfide bonds. This stmcture is schematically represented in the figure below. conserved 



H 

xCxxxxxCxxxxxxxxxxxxCxxxxxCxxxxxxxxxCxxxxxCxxxxxxxxxxxxCxxxxxCxxxxxxxx 
< xHingex ><TM><C> 

[ 1] Fukuda M. J. Biol. Chem. 266:21327-21330(1991) 

[ 2] Holness C.L.. da Silva R.R. Fawcett J.. Gordon S.,' Simmons D.L J. Biol. Chem. 268:9661-9666(1993). 
[0954] 327. Lipolytic enzymes "G-D-S-L" family, serine active site 

■ Aeromonas hydrophila lipase/phosphatidylcholine-sterol acyltransferase 
Xenortiabdus luminescens lipase 1 . 
Vibrio mimicus arylesterase. 

- Escherichia coli acyl-coA thioesterase I (gene tesA). 

- Vibrio parahaemolyticus themriolabile hemolysin/atypical phospholipase 

- Arabidopsis thaliana and Brassic napus anther-specific proline-rich protein APG 

" the act^S" Kromlt ' vL'"'"" '^''^^ "'"^^"''^ ^^'^ '^-"-^ a part of 

;:nrr^ratrb=r^^^^^ 

- consensus pattern: [UVMFYAG](4)-G-D-S-(UVM]-x(l,2)-[TAG]-G [S is the active site residue) 
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forrecent letings see [1,2,3D: - Major outer membrane lipoprotein (murein-ilpoprotelns) (gene top) - Escherichia coli 
l.pop'ote,n-28 gene nlpA). - Escherichia coli lipoproteln-34 (gene nIpB). -^herich a coll li^UternlpC S 
chenchB col. apoprotein nIpD. - Escherichia coli osmottealV inducible lipoprotein B (gene osmB) EXrichia Sli 

SS'S ' ?^'T 'J''"' - '^'^'"'^'^ ~" P«P«^°9^can-assS=^ted lipopiotei^geneTar 

S^^R pS^ rare l>poprote.ns a and B (genes rplA and rplB). - Escherichia coli copper homeostasis protein cutF 

betafcam.tl 1^?^^ 

beta-lactamases. - Bacillus subt.lis periplasmic oligopeptide-binding protein (gene oppA). - Borrelia burgdorferi outer 
surface prote.ns A and B (genes ospA and ospB). - Borrelia hermsii variable majoTpro ein 21 (gene SnSi)Td 7 

canase cel-S - Haemophilus influenzae proteins Pal and Pep. - Klebsiella pullulunase (gene pulA) - Klebsiella p^lu- 
rranTcTe^To'rf • ^^^^^^^ P^'^" P^^" " "^-plasma hyoSnis Lan. suScTX-s 

In lo. H ^ ?^ ' '^"^ '^'""''^"^ - Pse"domonas aeruginosa lipopeptide (gene 

IppL). - Pseudomonas solanacearum endoglucanase egl. - Rhodopseudomonas viridis reaction center cytochrome 
subunrt (gene cytC . - Ricketts« 17 Kd antigen. - Shigella flexneri invasion plasmid proteins mxiJ and m;JS S 
toccxcuspneumon«eoligopeptide transport proteinA(geneamiA).-TreponemapalliSium^ 
panidium rnembrane protein A (gene tmpA). - Vibrio harveyi chitobiase (gene chb). - Yersinia virulence plasmfdZt^ 
yscJ. - Halocyanm rom Natrobactenum pharaon is [4]. a membrane associated copper- binding protein This i^ the 

f^-^ T ^ ^"'^ ^ °' '"'^ '° ''^^""'y '^'^ ♦yP^ °* post-translational modificatbn was derived 

f"?"^ P^"^'"- <D^f^K}(6HLIVMFWSTAGJ(2)-[LIVMFYSTAGCX3]-[AGSJ-C [C is the lipid attachment 

must be at least one Lys or one Arg in the first seven positions of the sequence. 

[ 1] Hayashi S., Wu H.C. J. Bioenerg. Biomembr. 22:451-471(1990). 
[ 2] Klein P., Somorjai R.L, Lau P.C.K. Protein Eng. 2:15-20(1988). 
[ 3] von Heijne G. Protein Eng. 2:531-534(1989) 

[ 4^ Mattar S.. Schart B., Kent S.B.H.. Rodewald K., Oesterhelt D.. Engelhard M. J. Biol. Chem. 269:14939-14945 



S« f <'-°P°Prote,n 5) Prokaryotic membrane lipoprotein lipid attachment site. In prokaryotes. membrane lipo- 
protem are synthesized wrth a precursor signal peptide, which is cleaved by a specific lipoprotein signal oeptidS^ 
(s^nal peptidase II). The peptidase recognizes a consented sequence and cuts upstream o aXs,eiJt?sS to 

I^Slud^r""';?.' ''1 " """'^'^ °' '^^ '° ""dergo such pS^ing cf^nt^ 

-nclude (for recent listings see [1 ,2,3]): - Major outer membrane lipoprotein (murein-li?oproteins) (gene ?ppr Es 

nlortr n'""'" T"' - ^^'"''^'^ 'iP°P-t«-34 Oene nIpB). '^^E'scherichia'con ipoprote^ 

t^^r^! ? ' , l-poprotein E (gene osmE). - Escherichia coli peptidog^can^ssociated lipoprotein (gene paT 
^o^ nInF Tc^ T "Tf ^ ^^^"^ 'P"' ^P'^)- - ^^'^^^'-'^^'^ coli copper homeostasis protein S 
T ^^T!"^ ''^^ P'°'"'"'- ■ E^^hericha coli Col plasmids lysis proteins. - A number of BacTus 
surface orot^^^^^^^^^ R ? ' " 'T^T'^ W^^^-^^-g protein (gene oppA). - Borrelia burgdorferi o e 
surface Prote.ns A and B (genes ospA and ospB). - Borrelia hermsii variable major protein 21 (gene vmip21) and 7 

canase cel-3 - Haemophilus influenzae proteins Pal and Pep. - Klebsiella pullulunase (gene pulA) - Klebsiella p^lu- 
rrLTcTrnesTAScf • -:'^'=°P'^^'"^ P-»- P^^- ■ -^-P-asma hyoSnis variant su t^Xs 

P,«S1 ^ ■ '^"^ '"^"'"'^"^ P'°*«'" - Ps«"domonas aeruginosa lipopeptide (gene 
lubunH ^'^''^^^'^^^^^^'""^ endoglucanase egl. - Rhodopseudomonas viridte reaction ceZ cytochrome 
subunrt (gene cytC). - RickettsB 17 Kd antigen. - Shigella flexneri invasion plasmid proteins mxiJ and mxiM 

^cuspneumoniaeoligopeptidetransportproteinA(geneamiA).-TreponemapalliLm34Kdantigen--^^^^^ 
pamdium membrane protein A (gene tmpA). - Vibrio harveyi chitobese (gene chb). - Yersinia virulence plasm^rZ 
I ^' ' Natrobactenum pharaonis (4), a membrane associated copper- binding protein This is L Z 

archaebacterel protein known to be modified in such a fashion).From the precursor sequences o aJ thei protefns 
fo^S T' ' ' °' "^"^ '° '""'""^ •yP^ °' post-translational modificSbn have been devetopS 

SLiSrr!"' ^T'- {D^.f^'^(6)-(LIVMFWSTAG](2)-(LIVMFYSTAGCQHAGS]-C [C Is the lipid attachment 

must be at least one Lys or one Arg in the first seven positions of the sequence 

[096«q J 1] (^ayashi S.. Wu H.C. J. Btoenerg. Biomembr. 22:451-471(1990).! 2] Klein P.. Somoriai R L Lau PC K 
Protein Eng. 2:15-20(1988).[ 3] von Heijne G. Protein Eng. 2:531-534(1989). 4 Mattar b S B Kent S B H 
Rodewald K.. Oesterhelt D.. Engelhard M. J. Bfol. Chem. 269:14939-14945(1904) ^ ^ ' 
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[0^1] 330. (Lum binding) Riboflavin synthase alpha chain family Lum-binding site signature The following proteins 
have been shown [1 ,2) to be structurally and evolutionary related: - Riboflavin synthase alpha chain (RS^alpha) (gene 
ribC in Escherichia coli, ribB in Bacillus subtilis and Photobacterium leiognathi. RIBS in yeast). This enzyme synthesizes 
riboflavin from two moles of 6.7- dimethyl-8-(1 '-D-ribityl)lumazine (Lum). a pteridine-derivative. - Photobacterium phos- 
phoreum lunrtazine protein (LumP) (gene luxL). LumP is a protein that modulates the color of the bioluminescence 
emission of bacterial luciferase. In the presence of LumP. light emission is shifted to higher energy values (shorter 
wavelength). LumP binds non<ovalently to 6.7-dimethyl-8-(r.D-ribftyl) lumazine. - Vibrio fischeri yellow fluorescent 
protein (YFP) (gene luxY). Like LumP. YFP modulates light emission but towards a longer wavelength. YFP binds non- 
covalently to FMN. These proteins seem to have evolved from the duplication of a domain of aboutlOO residues In its 
C-termlnal section, this donnain contains a conserved motif [KR]-V-N-[LI]-E which has been proposed to be the binding 
site for Lum.RS-alpha which binds two molecules of Lum has two perfect copies of this motif, while LumP which binds 
one molecule of Lum. has a Glu instead of Lys/Arg in the first position of the second copy of the motif. Similarily, YFR 
which binds to one molecule of FMN, also seems to have a potentially dysfunctional binding site by substitution of Gly 
for Glu in the last positionof the first copy of the motif. Our signature pattern includes the Lum-binding motif 
[0962] Consensus pattem: [LIVMF]-x(5)-G-[STADNQ]-[KREQIYW]-V-N-[LIVM]-E 

[ 1] O'Kane D.J.. Woodward B., Lee J.. Prasher D.C. Proc. Natl. Acad. Sci. U.S.A. 88:1100-1104(1991) 
[ 2] O'Kane D.J.. Prasher D.C. Mol. Microbiol. 6:443-449(1992). 

[0963] 331 . Lysyl oxidase putative copper-binding region signature 

Lysyl oxidase (LOX) [1] is an extracellular copper<Jependent enzyme that catalyzes the oxidative deamination of pep- 
tidyl lysine residues in precursors of various collagens and elastins. The deaminated lysines are then able to form 
aldehyde cross-links. LOX binds a single copper atom which seems to reside within an octahedral coordination complex 
which includes at least three histidine ligands. Fourhistidine residues are clustered in a central region of the enzyme 
This region is thought to be involved in cooper-binding and is called the 'copper-talon' [1]. This region was used as a 
signature pattem. 

[0964] Consensus pattem: W-E-W-H-S-C-HO-H-Y-H 

[0965] [ 1] Krebs C.J., Krawetz S.A. Biochim. Biophys. Acta 1202:7-12(1993). 

[0966] 332. Metallo-beta-lactamase superfamily (lactamase_B) 

[0967] [1]; Neuwald AF, Liu JS, Lipman DJ. Lawrence CE. Nucleic Acids Res 1997;25:1665-1677. [2] Carfi A. Pares 
S, Duee E, Galleni M, Duez C, Frere JM, Dideberg O. EMBO J 1995;14:4914-4921. 
[0968] 333. L-lactate dehydrogenase active site (Idh1) 

L-lactate dehydrogenase (EC 1.1.1.27) (LDH) [1] catalyzes the reversible NAD^Jependent intercon version of pyruvate 
to L-lactate. In vertebrate muscles and in lactic acid bacteria it represents the final step in anaerobic glycolysis This 
tetrameric enzyme is present in prokaryotic and eukaryotic organisms. Invertebrates there are three isozymes of LDH- 
the M form (LDH-A). found predominantly in muscle tissues; the H form (LDH-B), found in heart muscle and the X form 
(LDH-C). found only in the spermatozoa of mammals and birds. In birds and crocodilian eye lenses, LDH-B serves as 
a structural protein and is known as epsilon-crystallin [2]. L-2-hydroxyisocaproate dehydrogenase (EC 1 . 1 . 1 .-) (L-hicDH) 
[3] catalyzes the reversible and stereospecific intercon version between 2-ketocarboxylic acids and L-2-hydroxy-car- 
boxylic acids. L-hicDH is evolutionary related to LDH's. As a signature for LDH's a region was selected that includes 
a conserved histidine which is essential to the catalytic mechanism. 
[0969] Consensus pattem: [LIVMA]-G-[EQ]-H-G-[DN]-[ST] [H is the active site residue] - 

[ 1] Abad-Zapatero C, Griffith J.P, Sussman J.L, Rossmann M.G. J. Mol. Biol. 198:445-467(1987). 

[ 2] Hendriks W.. Mulders J.W.M., Bibby M.A., Slingsby C, Btoemendal H., de Jong W.W Proc Natl Acad Sci 

U.S.A. 85:7114-7118(1988). 

[ 3] Lerch H.-R, Frank R., Collins J. Gene 83:263-270(1989). 
[0970] Malate dehydrogenase active site signature (Idh2) 

Malate dehydrogenase (EC M.1.37) (MDH) [1.2] catalyzes the intercon version of malate to oxaloacetate utilizing the 
NAD/NADH cofactor system. The enzyme participates in the citric acid cycle and exists in all aerobic organisms While 
prokaryotic organisms contains a single fomi of MDH, in eukaryotic cells there are two isozymes: one which is located 
in the mitochondrial matrix and the other in the cytoplasm. Fungi and plants also harbor a glyoxysomal form which 
functions in the glyoxylate pathway In plants chloroplast there is an additional NADP-dependent form of MDH (EC 
lil82) whrch is essential for both the universal C3 photosynthesis (Calvin) cycle and the more specializedC4 cycle 
As a signature pattern for this enzyme a region was chosen that includes two residues involved in the catalytic mech- 
anism [3]: an aspartic acid which is involved in a proton relay mechanism, and an arginine which binds the substrate 
[0971] Consensus pattem: [LIVM]-T-[TRKMN]-L.D-x(2)-R.(STAhx(3).[LIVMFY] [D and R are the active site resi^ 
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dues]- 



[ 1] McAlister-Henn L Trends Biochem. ScL 13:178-181(1988). 

[ 2] Gietl C. Biochim. Bbphys. Acta 1100:217-234(1992). 

[ 3] Birktoft J. J., Rhodes G., Banaszak LJ, Biochemistry 28:6065-6081(1989) 

[ 4] Cendrin R. Chroboczek J.. Zaccai G.. Eisenberg H.. Mevarech M. Biochemistry 32:4308-4313(1993). 
[0972] 334. Legume lectins signatures 

found ,n ttie seeds. The exact function of legume lectins Is not known but they may be involved in the attachment of 
nrtrogen-fixrng bactena to legumes and in the protection against pathogens. Legume lectins binTcJoZ^^^^- 

residues. Some legume lectins are proteolytically processed to produce two chains: beta (which corresponds to the 

N4erminal)andalpha(C-temiinal).The lectin concanavalinA(conA) from jack bean is exceptional inma^^^^^^ 

toCS^^'elr.l^ ^l'"nT°'' °' ^ '''''' '""'J- "^^ °' conAVh cor^S 

to that of the alpha chain and the C-term,nus to the beta chain. Two signature patterns specific to legume lectins have 

been developed: the first is located in the C-terminal section of the beta chain and conta^is a 00°^ aspaSte^^^^^ 

residue important for the binding of calcium and manganese; the second one is located in the Nrrnalotme alpha 

[0973] Consensus pattern: [LIV1-[STAG]-V-[DEQV]-(FLI]-D-|ST] [D binds manganese and calciuml- 
Consensus pattern: [LIV]-x-[EDQ]-[FYWKR]-V-x-[LIVF]-G-[LFl-IST|- 

[ 1] Sharon N., Lis H. FASEB J. 4:3198-320(1990). 

[ 2] Lis H., Sharon N. Annu. Rev. Brochem. 55:33-37(1986). 

[0974] 335. CoA-ligases (ligases- CoA) 

ATSr^JTiJlr'tl"''"''"' 't!' Succinyl-CoA synthetase alpha: and beta chains, malate CoA ligase and 

ATP-citrate lyase. Some members of the family utilise ATP others use GTP. 

'JJ Wolodko WT. Fraser ME, James MN. Bridger WA. J Biol Chem 1994;269- 10883-1 0890 
[0977] 336. linker histone HI and H5 family 

S,«c ^'^u?° 1 ^ "''"P°"e"» °' chromatin structure. HI links nucleosomes into higher order 

structures Histone HI is replaced by histone H5 in some cell types. 

1^12 ^ Ra"Takrishnan V, Finch JT, Graziano V, Lee PL, Sweet RM, Nature 1 993 362 21 9-223 
[0980] 337. Lipocalin signature (lipl) ' " ' 

Proteins which trarisport small hydrophobic molecules such as steroids, bilins, retinoids, and lipids share limited reoions 

rsT o S IZTi "1 ^ "^"^ ''"^'"9 -^J- '«P°^'i"' has been proposed 

SrZ^lf 1 '° """"S *° '^'""y below (references are only provide7for 

recent^ detemiined sequences). - Alpha-1 -microglobulin (protein HC), which seems to bind porphyrin Alpha 1 -acS 
glycoprotein (orosonnucoid). v^*,ich can bind a remarkable array of natural and synthetic compoEnSs [6] SphrLls^ 
Which, in hamsters, functions as an aphrodisiac pheromone. - Apolipoprotein D. which probably binds hem^'X^ 
compounds^- Beta-lactogtobulin. a milk protein whose physiological function appears to bind reror. (TmpCem 

binds astaxanthin. a carotenoid. - Epididymal-retinoic acid binding protein (E-RABP) [9] involved in sperm maturS.n 
- lnsectacyan,n. a nnoth bilin-binding protein, and a related butterfly bilin- biriding protein^BBP). - llteTJoTpro eSi 
n^uri TwlT T""^' """^'y ■ gelatinase-associated lipoca in (NGAL) (p25) (5^4^ 

PRBP? h!ZTJJ' ^- ' «h-h binds odorants. - Plasma re^inoi-biXg p otei^ 

PRBP). - Human pregnancy-associated endometrial alpha-2 gbbulin. - Probasin (PB). a rat prostatic protein - Pros- 
Pum^S: a rS'"/''' (GSH-independent PGD synthetase), a lipocalin with enzymatic'ac^ f^" 

CH?1 orotein? ^ f f '"P""" " ^P^^^ protein p20K fL chicken Temb^o 

pheramone transport proteins from mouse vomeronasal organ [13]. - Vbn Ebner-s gland protein (VEGP) [ 4lS 
^ '"^'"'"^ "^^'^ ^«<=°9nition - A f ^g olfactory prite n £ 

sT^iImT TT- ■ cerebrospinal fluki of the toad Bufo Marinus with a supSL«fo„ 

l>2.«Ji«:h could transport smallhydrophobicmoleculesintotheepididymalfluidd^ 

otic ou er-membrane protein bic (17J.The sequences of most members of the family the core or kernel I ScanJ^ Je 
Characterized by three short conserved stretches of residues [3. 18].Others. the outHer lipocalin grouTsSTonX o^e 
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or two of these [3.18]. A signature pattern was built around tlie first, common to all outlier and kernallipocalins wfiich 
occurs near the start of the first beta-strand. 

[0981] Consensus pattern: [DENG)-x-[DENQGSTARK]-x(0.2)-[DENQARK]-(LIVFY]-{CP)-G-{C}- W-lFYWLRH]-x- 

Note: it is suggested, on the basis of similarities of structure, function, and sequence, that this family forms an overall 
superfamily. called the calycins, with the avidin/streptavidin <PDOC00499 > and the cytosolic fatty- acid binding proteins 
<PDOC00188 > families [3. 19] 

[ 1] Cowan S.W.. Newcomer M.E., Jones TA Proteins 8:44-61(1990). 

[ 2] Igaraishi M.. Nagata A., Toh H.. Urade H., Hayaishi H, Proc. Natl. Acad. Sci. U.S.A. 89-5376-5380(1992) 
[ 3] Flower D.R., North A.C.T, Attwood TK. Protein Sci. 2:753-761(1993). 
[ 4] Godovac-Zimmermann J. Trends Biochem. Sci. 13:64-66(1988). 
[ 5] Pervaiz S., Brew K. FASEB J. 1:209-214(1987). 

[ 6] Kremer J.M.H., Wilting J., Janssen L.H.M. Pharmacol. Rev. 40:1-47(1989). 

[ 7] Haefliger J. -A., Peitsch M.C., Jenne D., Tschopp J. Mol. Immunol. 28:123-131(1991). 

[8] Keen J.N.. Caceres L, Eliopoulos E.E., Zagalsky RF.. Findlay J.B.C. Eur J. Biochem 197*407-417(1991) 

[ 9] Newcomer M.E. Structure 1:7-18(1993). 

[10] Collet C, Joseph R. Biochim. Blophys. Acta 1167:219-222(1993), 

[11] Kjeldsen L. Johnsen A.H., Sengelov H.. Borregaard N. J. Biol. Chem. 268:10425-10432(1993) 

[12] Peitsch M.C.. Boguski M.S. Trends Biochem. Sci. 16:363-363(1991). 

[13] Miyawaki A.. Matsushita YR.. Ryo Y., Mikoshiba T EMBO J. 13:5835-5842(1994). 

[14] Kock K., Ahlers C, Schmale H. Eur J. Biochem. 221:905-916(1994). 

[15] Achen M.G.. Harms RJ., Thomas T, Richardson S.J., Wettenhall R.E.H.. Schreiber G J Biol Chem 267" 
23170-23174(1992). 

[16] Morel L., Dufarre J.-R, Depelges A. J. Biol. Chem. 268:10274-10281(1993). 

[17] Bishop R.E., Penfold S.S.. Frost L.S., Holtje J.V., Weiner J.H. J. Biol. Chem. 270:23097-23103(1995^ 
[18] Flower D.R., North A.C.T, Attwood TK. Biochem. Biophys. Res. Commun 180 69-74(1 991) 
[19] Flower D.R. FEBS Lett. 333:99-102(1993). 

[0982] Cytosolic fatty-acid binding proteins signature (Iip2) 

A number of low molecular weight proteins which bind fatty acids and other organic anions are present in the cytosol 
[1,2]. Most of them are structurally related and have probably diverged from a common ancestor This structure is a 
ten stranded antiparallel beta-barrel, albeit with a wide discontinuity between the fourth and fifth strands, with a repeated 
+ 1 topology enclosing an internal ligand binding site [2.7]. Proteins known to belong to this family include- - Six tissue- 
specific, types of fatty acid binding proteins (FABPs) found in liver, intestine, heart, epidermal, adipocyte brain/retina 
Heart FABP is also known as mammary^ierrved growth inhibitor (MDGI), a protein that reversibly inhibits proliferation 
of mammary carcinoma cells. Epidermal FABP is also known as psoriasis-associated FABP [3]. - Insect muscle fatty 
acid-binding proteins. - Testis lipid binding protein (TLBP). - Cellular retinol-binding proteins I and II (CRBP) - Cellular 
retinoic acid-binding protein (CRABP). - Gastrotropin. an ileal protein which stimulates gastric acid and pepsinogen 
secretion. It seems that gastrotropin binds to bile salts and bilirubins. - Fatty acid binding proteins MFB1 and MFB2 
from the midgut of the insect Manduca sexta [4].ln addition to the above cytosolic proteins, this family also includes- - 
Myelin P2 protein, which may be a lipid transport protein in Schwann cells. P2 is associated with the lipid bilayer of 
myelin. - Schistosoma mansoni protein Sm14 [5] which seems to be involved in the transport of fatty acids - Ascaris 
suum pi 8 a secreted protein that may play a role in sequestering potentially toxic fatty acids and their peroxidation 
products or that may be involved in the maintenance of the impermeable lipid layer of the eggshell - Hypothetical fatty 
acid-binding proteins F40F4.2. F40F4.3. F40F4.4 and ZK742.5 from Caenortiabditis elegans. As a signature pattern 
for these proteins a segment from the N-terminal extremity was use. 

[0983] Consensus pattern: [GSAIVK]-x-[FYW]-x-[LIVMF].x(4).[NHG]-[FY].[DE].x-[LIVMFYJ-[LIVM]-x(2).[LIV- 
MAKR]- 

Note: it is suggested, on the basis of similarities of structure, function, and sequence, that this family forms an overall 
superfamily. called the calycins. with the lipocalin <PDOC00187 > and avidin/streptavidin <PDOC00499 > families [6,7J. 

[ 1] Bernier I.. Jolles P. Biochimie 69:1127-1152(1987). 

[ 2J Veerkamp J.H., Peeters R.A.. Maatman R.G.H.J. Biochim. Biophys. Acta 1081:1-24(1991). 

[ 3JSiegenthalerG.. Hotz R.. Chatellard-Gruaz D., Didierjean L. Hellman U.. Saurat J.-H. Biochem. J. 302:363-371 

(1994). 

[ 4] Smith A.F. Tsuchida K., Hanneman E.. Suzuki TC, Wells M.A. J. Biol. Chem. 267:380-384(1992) 
[ 5] Mc^er D„ Tendler M.. Griffiths G., Klinkert M.-Q. J. Biol. Chem. 266:8447-8454(1991), 
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[ 6] Flower D.R.. North A.C.T, Attwood IK. Protein Sci. 2:753-761(1993). 
[ 7] Flower D.R. FEBS Lett, 333:99-102(1993). 

[0984] 338. Lipoxygenases Iron-binding region signatures 

Lipoxygenases (EC 1.1311.-) are a class of iron-containing dioxygenases which catalyzes the hydroperoxidation of 
lipids, containing a cis,cis-1 ,4-pentadiene structure. They are comnrton In plants where they nnay be involved in a number 
of diverse aspects of plant physiology including growth and development, pest resistance, and senescence or respons- 
es to wounding [1 ]. In mammals a number of lipoxygenases isozymes are involved in the metabolism of prostaglandins 
and leukotrlenes [2]. Sequence data is available for the following lipoxygenases: - Plant lipoxygenases (EC 1. 13.11. 12 ). 
Plants express a variety of cytosolic isozymes as well as what seems [3] to be a chloroplast isozyme. - Mammalian 
arachidonate 5-lipoxygenase (EC 1.1311.34 ). - Mammalian arachidonate 1 2-lipoxygenase (EC 1.1311.31 ). - Mam- 
malian erythroid cell-specific 1 5-lipoxygenase (EC 1.13.11.33 ).The iron atom in lipoxygenases is bound by four ligands, 
three of which are histidine residues [4]. Six histidines are consented in all lipoxygenase sequences, five of them are 
found clustered in a stretch of 40 amino acids. This region contains two of the three zinc-ligands; the other histidines 
have been shown [5] to be important for the activity of lipoxygenases. As signatures for this family of enzymes two 
patterns in the region of the histidine cluster were selected. The first pattern contains the first three consented histidines 
and the second pattern includes the fourth and the fifth. 

Consensus pattem: H-[EQ]-x(3)-H-x-[LM]-[NQRC]-[GST]-H-[LIVMSTAC](3)-E [The second and third H's bind iron]- 
[0985] Consensus pattem: [LIVMA]-H-P-[LIVM]-x-[KRQ]-(UVMF](2)-x-[AP]-H- 

[ 1] Vick B.A., Zimmerman D C. (In) Biochemistry of plants: A comprehensive treatise, Stumpf PK., Ed., Vol. 9, 
pp.53-90, Academic Press, New-York, (1987). 

[ 2] Needleman P. Turk J., Jakschik B.A., Morrison A.R., Lefkowith J.B. Annu. Rev. Biochem. 55:69-102(1986). 
{ 3] Peng Y.L.. Shirano Y, Ohta H., Hibino T, Tanaka K., Shibata D. J. Biol. Chem. 269:3755-3761(1994). 
[ 4] Boyington J.C., Gaffney B.J.. Amzel LM. Science 260:1482-1486(1993). 

[ 5] Steczko J.. Donoho G.P, Clemens J.C., Dixon J.E.. Axelrod B. Biochemistry 31:4053-4057(1992). 
[0986] 339. Fumarate lyases signature (lyase_1) 

A number of enzymes, belonging to the lyase class, for which fumarate is a substrate have been shown [1 ,2] to share 
a short conserved sequence around a methionine which is probably involved in the catalytic activity of this type of 
enzymes. These enzymes are: - Funnarase (EC 4.2. 1.2 ) (fumarate hydratase), which catalyzes the reversible hydration 
of fumarate to L-malate. There seem to be 2 classes of funnarases: class I are thermolabile dimeric enzymes (as for 
example: Escherichia coli fumC); class II enzymes are thermostable and tetrameric and are found in prokaryotes (as 
for example: Escherichia coli f umA and f umB) as well as in eukaryotes. The sequence of the two classes of f umarases 
are not closely related. - Aspartate ammonia-lyase (EC 4.3.1.1 ) (aspartase), which catalyzes the reversible conversion 
of aspartate to fumarate and ammonia. This reaction is analogous to that catalyzed by f umarase, except that ammonia 
rather than water is involved in the trans-elimination reaction. - Arginosucclnase (EC 4.3.2.1 ) (argininosuccinate lyase), 
which catalyzes the fomr^ation of arginine and fumarate from argininosuccinate, the last step in the biosynthesis of 
arginine. - Adenylosuccinase (EC 4.32.2 ) (adenylosuccinate lyase) [3], which catalyzes the eight step in the de novo 
biosynthesis of purines, the formation of 5'-phosphoribosyl-5-amino-4-imidazolecarboxamide and fumarate from 1- 
(5-phosphoribosyl)-4-(N-succino-carboxamide). That enzyme can also catalyzes the fomnation of fumarate and AMP 
from adenylosuccinate. - Pseudomonas putida 3-carboxy-cis,cis-muconate cycloisomerase (EC 5.5.1.2 ) (3-carboxy- 
muconate lactonizing enzyme) (gene pcaB) [4], an enzyme involved in aromatic acids catabolism 
[0987] Consensus pattem: G-S-x(2)-M-x(2)-K-x-N- 

[ 1] Woods S.A., Shwartzbach S.D.. Guest J.R Biochim. Biophys. Acta 954:14-26(1988). 
[2] Woods S.A.. Miles J.S., Guest J.R. FEMS Mrcrobral. Lett. 51:181-186(1988). 
[ 3] Zaikin H,, Dixon J.E. Prog. Nucleic Acid Res. Mol. Biol. 42:259-287(1992). 

[4] Williams S.E., Woolridge E.M.. Ransom B.C., Landro J. A., Babbitt PC, Kozarich J.W Biochemistry 31* 
9768-9776(1992). 

[0988] 340. MCM family signature and profile 

Proteins shown to be required for the initiation of eukaryotic DNA replication share a highly conserved domain of about 
210 amino-acid residues [1,2,3]. The latter shows some similarities [4] with that of various other families of DNA- 
dependent ATPases. Eukaryotes seem to possess a family of six proteins that contain this domain. They were first 
identified in yeast where mosX of them have a direct role in the initiation of chronrK>somal DNA replication by interacting 
directly with autonomously replicating sequences (ARS). They were thus called 'minichronrwDsome maintenance pro- 
teins' with gene symbols prefixed by MCM. These six proteins are: - MCM2, also known as cdc19 (in S.pombe) [Ell 
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- MCM3, also known as DNA polymerase alpha holoenzyme-associated protein PI . RLF beta subunit or ROA. - MCM4 
also known as CDC54, cdc21 (in S.pombe) or dpa (in Drosophila). - MCM5, also known as CDC46 or nda4 (In s" 
pombe). - MCM6. also known as mis5 (in S.pombe). - MCM7. also known as CDC47 or Prolifera (in A.thaliana).This 
family is also present in archebacteria. In Methanococcusjannaschiithere are four members: MJ0363, MJ0961. K4J1489 
and MJECLiaihe presence of a putative ATP-binding domain implies that these proteins maybe involved in an ATP- 
consuming step in the initiation of DNA replication in eukaryotes. As a signature pattern, a perfectly conserved region 
was selected that represents a special version of the B motif found in ATP-binding proteins. 
[0989] Consensus pattern; G-[IVT]-[LVACI(2HIVT]-D-[DEJ-[FL]-[DNST] 

[ 1] Coxon A., Maundrell K., Kearsey S.E. Nucleic Acids Res. 20:5571-5577(1992). 

[ 2] Hu B., Burkhart R., Schulte D., Musahl C, Knippers R. Nucleic Acids Res. 21:5289-5293(1993) 

I 3] Tye B.-K. Trends Cell BioL 4:160-166(1994). 

[ 4] Koonin E.V. Nucleic Acids Res. 21:2541-2547(1993). 

[09901 341 . Macrophage migration inhibitory factor family signature (MIF) 

A protein called macrophage migration inhibitory factor (MIF) [1] seems to exert an important role in host inflammatory 
responses. It play a pivotal role in the host response to endotoxic shock and appears to serve as a pituitary "stress- 
hormone that regulates systemic inflammatory responses. MIF is a secreted protein of 115 residues which is not proc- 
essed from a larger precursor. D-dopachrome tautomerase [2] is a mammalian cytoplasmic enzyme involved in melanin 
biosynthesis and that tautomerizes D<lopachrome with concomitant decarboxylation to give 5.6-dihydroxyindole (DHI) 
It is a protein of 117 residues highly related to MIF It must be noted that MIF binds glutathione and has been said to 
be related to glutathrane S-transferases. This assertion has been later disproved [3].As a signature pattem for these 
proteins, a conserved region was selected located in the central section. 
[0991] Consensus pattem: [DEl-P-C-A-x(3)-[LIVM]-x-S-l-G-x-[LIVM]-G- 

[ 1] Bucala R. Immunol. Lett. 43:23-26(1994). 

[ 2] Odh G., Hindemith A., Rosengren A.-M., Rosengren E.. Rorsman H. Biochem. Biophys. Res Commun 197* 
619-624(1993). 

[ 3] Pearson W.R. Protein Sci. 3:525-527(1994). 
[0992] 342. MIP family signature 

Recently the sequence of a number of different proteins, that all seem to be transmembrane channel proteins, has 
been found to be highly related [1 to 4].These proteins are listed below - Mammalian major intrinsic protein (MIP).' MIP 
is the major component of lens fiber gap junctions. Gap junctions mediate direct exchange of ions and small molecule 
from one cell to another - Mammalian aquaporins [5]. These proteins form water-specific channels that provide the 
plasma membranes of red cells and kidney proximal and collecting tubules with high permeability to water, thereby 
permitting water to move in the direction of an osmotic gradient. - Soybean nodulin-26, a major component of the 
penbacteroid membrane induced during nodulation in legume roots after Rhizobium Infection. - Plants tonoplast intrinsic 
proteins (TIP). There are various Isoforms of TIP: alpha (seed), gamma, Rt (root), and Wsi (water-stress induced) 
These proteins may allow the diffusion of water, am ino acids and/or peptides from the tonoplast interior to the cytoplasm. 
- Bacterial glycerol facilitator protein (gene gIpF), which facilitates the movement of glycerol across the cytoplasmic 
membrane. - Salmonella typhimurlum propanediol diffusion facilitator (gene pduF). - Yeast FPS1, a glycerol uptake/ 
efflux facilitator protein. - Drosophila neurogenic protein 'big brain' (bib). This protein may mediate intercellular com- 
munication; it may functions by allowing the transport of certain molecules(s) and thereby sending a signal for an 
exodermal cell to become an epidemrioblast instead of a neuroblast. - Yeast hypothetical protein YFL054c. - A hypo- 
thetical protein from the pepX region of iactococcus lactis. The MIP family proteins seem to contain six transmembrane 
segments. Computer analysis shows that these protein probably arose by a tandem, intragenic duplication event from 
an ancestral protein that contained three transmembrane segments. As a signature pattem a well consented region 
was selected which Is located in a probable cytoplasmic loop between the second and third transmembrane regions 
[0993] Consensus pattem: [HNQA]-x-N-P-[STA]-[LIVMF]-[ST]-[LIVMF]-[GSTAFY]- 

[ 1] Reizer J.. Reizer A.. Saier M.H. Jr CRC Crit. Rev. Biochem. 28:235-257(1993). 
[ 2] Baker M.E., Saier M.H. Jr. Cell 60: 185- 186(1 990V 

[ 3] Pao G.M., Wu L-F, Johnson K.D,. Hoefte H.. Chrispeels M.J,. Sweet G.. Sandal N.N . Saier M H Jr Mol 
Microbiol. 5:33-37(1991). 

f 4] Wistow G.J., Pisano M.M., Chepelinsky A.B. Trends Biochem. Scl. 16:170-171(1991). 
[ 5] Chrispeels M.J., Agre P. Trends Biochem. Sci. 19:421-425(1994). 
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[0994] 343. Mandelate racemase / muconate lactonizing enzyme family signatures 

Mandelate racemase (EC 5.1.2,2) (MR) and muconate lactonizing enzyme(EC 5.5.1.1 1 (MLE) are two bacterial en- 
zymes invofved in aromatic acid catabolism. They catalyze mechanistically distinct reactions yet they are related at 
the level of their prinnary, quaternary (homooctamer) and tertiary structures [1 ,2). A number of other proteins also seem 
to be evolutionary related to these two enzymes. These are: - The various plasmid-encoded chloromuconate cyclois- 
omerases (EC 5.5.1.7). - Escherichia coli protein rspA [3], rspA seems to be involved in the degradation of homoserine 
lactone (HSL) or of one of its metabolite, - Escherichia coli hypothetical protein ycjG. - Escherichia coli hypothetical 
protein yidU, - A hypothetical protein from Streptomyces ambofaciens [4]. Two signature patterns have been developed 
for these enzymes; both contain conserved acidic residues. 

[<msi The second pattern contains an aspartate and a glutamate which are ligands for either a magnesium ion (in 
MR) or a manganese ion (inMLE). 

[0996] Consensus pattem: A-x-[SAGCN]^SAG]4LIVMJ-[DEQ]-x-A-[LA]-x-[DE]-[LIAhx-[GA]-[KRQl-x(4)-rPSAl- 
[LIV]-x(2)-L-[LIVMF]-G- 

Consensus pattem: [LIVF]-x(2)-D-x.[NHl-x(7)-tACL]-x(6)-[LIVMF].x(7)-[LIVM]- E-{DENQ1-P (D and E bind a divalent 
metal ion]- 

[ 1] Neidhart D.J.. Kenyon G.L, Gerit J.A., Petsko G.A. Nature 347:692-694(1990), 

[ 2] Petsko G.A., Kenyon G.L, Gerlt J.A., Ringe D., Kozarich J.W. Trends Biochem. Sci. 18:372-376(1993). 

[ 3] Huisman G.W., Kolter R. Science 265:537-539(1994). 

[ 4] Schneider D.. Aigle B.. Leblond R. Simonet J.M., Decaris B. J. Gen. Microbiol. 139:2559-2567(1993). 
[0997] 344. Merozoite Surface Antigen 2 (MSA-2) family 

[0998] Thomas AW, Carr DA, Carter JM, Lyon JA. Mol Biochem Parasitol 1 990;43:21 1 -220. 
[0999] 345. MSP (Major sperm protein) domain. 

[1000] Major sperm proteins are involved in sperm motility. These proteins oligomerise to form filaments. Partial 
matches to this domain are also found in other non MSP proteins. These include Swiss: P40075 and Swiss: P34593 
[1001] [1] Bullock TL. Roberts TM. Stewart M. J Mol Biol 1996;263:284-296. [2] King KL. Stewart M. Roberts TM 
Seavy M, J Cell Sci 1992; 101:847-857. 

[1002] 346. (Matrix) Viral matrix protein. Found in Morbillivirus and paramyxovirus, pneumovirus. Number of mem- 
bers: 105 

[1003] 347. O-methyltransf erase (methyltransf) 

[1004] This family includes a range of O-methyltransferases. These enzymes utilise S-adenosyl methionine. 
[1005] [1 ] Keller NP, Dischinger HC, Bhatnagar D. Cleveland TE, Ullah AH, AppI Environ Microbiol 1 993;59:479-484. 
[1006] 348. Magnesium chelatase, subunit Chll 

[1007] Magnesium-chelatase is a three-component enzyme that catalyses the insertion of Mg2+ into protoporphyrin 
IX. This is the first unique step in the synthesis of (bacterio)chlorophyll. Due to this, it is thought that Mg<helatase has 
an important role in channeling inter- mediates into the (bacterio)chlorophyll branch in response to conditions suitable 
for photosynthetic growth. Chll and BchD have molecular weight between 38-42 kDa. 

[1008] [1] Walker CJ. Willows RD. Biochem J 1997;327:321-333. [2] Petersen BL, Jensen PE, Gibson LC, Stummann 
BM, Hunter CN, Henningsen KW, J Bacteriol 1998;180:699-704. 
[1009] 349. Plasmid recombination enzyme (Mob_Pre) 

[1010] With some plasmids, recombination can occur in a site specific manner that is independent of RecA. In such 

cases, the recombination event requires another protein called Pre. Pre is a plasmid recombination enzyme. This 

protein is: also known as Mob (conjugative mobilization). 

[1011] [1] Priebe SD, Lacks SA, J Bacteriol 1989;171 :4778-4784. 

[1012] 350. Monooxygenase 

[1013] This family includes diverse enzymes that utilise FAD. 

[1014] [1] Gatti DL, Palfey BA, Lah MS, Entsch B. Massey V Ballou DP. Ludwig ML, Science 1994-266 H0-114 
[1015] 351. Mov34 family 

[1 01 6] Members of this family are found in proteasome regulatory subunits, eukaryotic initiation factor 3 (el F3) sub- 
units and regulators of transcription factors. 

[1 01 7] [1 ] Aravind L, Ponting CP. Protein Sci 1 998;7: 1 250-1 254. [2] Hershey JW, Asano K, Naranda T. Vornlocher 
HP. Hanachi R Merrick WC, Biochimie 1996;78:903-907. 
[1018] 352. Myc amino-terminal region (Myc_N_term) 

[1019] The myc family belongs to the basic helix-loop-helix leucine zipper class of transcriptbn factors, see HLH. 
Myc forms a heterodimer with Max. and this complex regulates cell growth through direct activation of genes involved 
in cell replicatbn (2]. 

[1(^0] [1] Facchini LM. Penn LZ. FASEB J 1998:12:633-651- [2] Grandori C. Eisenman RN. Trends Bbchem &i 
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[IMI] 353 (Metallothio_2) Metallolhionein. Members of this family are metallothioneins. These proteins are cysteine 

:rr;rerrers:r^^ 

[1022] III IWedline: 98267202. Characterization of gene repertoires at mature stage of citrus fruits throuqh random 
ZrZS sTr? °' TrT -etallothionein-like genes expressed during fruit development ToS^^ 
Kita M, Hisada S, Endo-lnagaki T, Omura M; Gene 1998-211 221-227 
[1023] 354. MAGE family 

Si ^T,^ ("melanoma antigen-encoding gene) family are expressed in a wide variety of tumors but not in 

?.! ; Ak' °' "^'^ ^^"^ P°^""y- °' ^he developing embryo The 

cellular function of this family is unknown. omuiyu. i 

E 199i63^";a "^"^ ^' ^""3 ^'^^ '^^'^'^ P- Gatti Mol Genet 

[1026] 355 Malic enzymes signature. Malic enzymes, or malate oxidoreductases. catalyze the oxidative decarbox- 

IZT ^ ; ; ^" ■ '^'^^-^^P^"^^"* ^"^'"s (EC 1.1.1.38 ). Which uses preferentially NAD and has the ability to 
decarboxylate oxaloacetate (OAA). It is found in bacteria and insects. - NAD^ependent malic enzyme (EC 1 1 IgT 
H T! NAD ^ ""^'X^ t° decarboxylate OAA. It fe found in the mitochond^rnitrbTiSts 

f M'!no'*'L?"'''°* "3'^ relatedsubunrts. - NADP-dependent malic enzyme (EC 1 .1 .1 .40) which h^a preter^^^^^ 

there are two isozymes: one. mrtochondrial and the other, cytosolic. Plants also have two isozymes- chloropTasrand 

sfcA. whose function ,s not yet known but which could be an NAD or NADP<Jependent malic enzyme vSst CdT 
thetical protein YKL029c, a probable malic enzyme. There are three well consented regions in theTn^me sequenced 
Two Of them seem to be involved in binding NAD or NADR The signfficance of the third one. located7me central part 

" '"T:'^'" '''''' ^ ^ P-«-- these Cmes^^^^^ 

P,°"! ^^"^^"^'^■'^"''■'°^"°-''<2^-°-"^-[°SAJ-^-t'Vl-'^-[LIVMA]-[GAST](2)-[LIVMF](2)- 
S i TT!:.^?":'''' ^ ' ^82:225-233(1985).(2] Loebera. Infan e A.A.. Maurer-Fogy I 

26S7.28^r9^^^^^ 266:3016-3021(1991). [ 3] Long J.J.. Wang J.-L. Berry J.O. J. Biol. S'ern.' 

[1029] 356. (matrlxin) 

Matrixins cysteine switch (aka peptidase_M10) 

<PdSc0012^^^^ r''*"' (EC 3.4.24.-). also known as matrbcins [1] (see 

frn^?h f 2<nc-dependent enzymes. They are secreted by cells in an inactive form (zymogen) that differs 
from the mature enzyme by the presence of an N-temiinal propeptide. A high^ conserved octapeptide is found tS 

brtion of matrocns [2.3]; a cysteine within the octapeptide chelates the actK^e site zinc ion. thus inhibiting the en^Jme 
This region has been called the 'cysteine switch' or 'autoinhibitor region'. ^ 
[1031] A cysteine switch has been found in the following zinc proteases: 

- MMP-1 (EC 3.4.24.7) (interstitial collagenase). 

- MMP-2 (EC 3.4.24.24) (72 Kd gelatinase). 

- MMP-3 (EC 3.4.24.17) (stromelysin-1 ). 

- MMP-7 (EC 3.4.24.23) (matrilysin). 

- MMP-8 (EC 3.4.24.34) (neutrophil collagenase). 

- MMP-9 (EC 3.4.24.35) (92 Kd gelatinase), 
■ MMP-10 (EC 3.4.24.22) (stromelysin-2). 

- MMP.11 (EC 3.4,24.-) (stromelysin-3). 

- MMP-1 2 (EC 3,4.24.65) (macrophage metalloelastase) 

- MMP-1 3 (EC 3.4,24.-) (collagenase 3). 

- MMP 14 (EC 3.4.24.-) (membrane-type matrix metalliproteinase 1). 

- MMP-1 5 (EC 3.4.24.-) (membrane-type matrix metalliproteinase 2). 

- MMP-16 (EC 3,4.24.-) (membrane-type matrix metalliproteinase 3). 

- Sea urchin hatching enzyme (EC 3.4.24.12) (envelysin) [4]. 

- Chlamydomonas reinhardtii gamete tytic enzyme (GLE) [5]. 

to^i?H.^^H^^"^^^^^^ ^^^^^^ ^ sequences known to belong 

to this class detected bythe pattern ALL, except for cat MMP-7 and mouse MMP-1 1. owniooeiong 
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[ 1] Woessner J. Jr. FASEB J. 5:2145-2154(1991). 

!So^oT''^^^"^°''^^ ^ " ^-^^ Matrisian LM.. Breathnach R J. Biol. Chem. 263:11892-11899 

(1 988). 

( 3] Park A.J.. Matrisian LM., Kells A.F.. Pearson R.. Yuan Z.. Navre M. J. Bfol. Chem 266-1584-1590n991^ 
[ 4] Lepage T, Cache C. EMBO J. 9:3003-3012(1990). 

[ SJKinoshitaT.. FukuzawaH.. ShimadaT, SaitoT. IVIatsuda Y Proc. Natl. Acad. Sci. U.S.A. 89:4693-4697(1992). 
[1033] 357. Vertebrate metallothioneins signature (metalthio) 

Metallothloneins (MT) [1 ,2.3] are small proteins which bind heavy metals such as zinc, copper, cadmium nickel etc 
through clusters of thiolate bonds. MPs occur throughout the animal kingdom and are alsofound in higher plants 'f unqi 
and some prokaryotes. On the basis of structural relationships MPs have been subdivided into three classes Class I 
includes mammalian MPs as well as MTs from crustacean and molluscs, but with clearly related primary structure 
Class II groups together I^Ts from various species such as sea urchins, fungi, insects and cyanobacteria which display 
none or only very distant correspondence to class I MT's. Class III MTs are atypical polypeptides containing gamma- 
glutamylcyste.nyl unrts. Vertebrate cbss I MPs are proteins of 60 to 68 amino acid residues 20 of these residues^e 

, 1° l*"'"^'^"* ""^^^ ^ "'3"^*"'^ P^"^"^ ^ «P«"s 9 ^esit^^es and which contains 

^ metal-binding cysteines was chosen, this region is located in the N-terminal section of class-l MPs 
[1034] Consensus pattern: C-x-C-[GSTAP]-x(2)-C-x-C-x(2)-C-x-C-x(2)-C-x-K- 

[ 1] Hamer D.H. Annu. Rev. Biochem. 55:913-951(1986). 

[ 2] Kagi J.H.R., Schaffer A. Biochemistry 27:8509-6515(1988). 

[ 3] Binz P-A. Thesis, 1996, University of Zurich. 

[1035] 358. Mitochondrial energy transfer proteins signature (mito_carr) 

Different types of substrate carrier proteins involved in energy transfe"r are found in the inner mitochondrial membrane 
(1 to 5]. These are: - The ADP.ATP carrier protein (AAC) (ADP/ATP translocase) which exports ATP into the cytosol 
n!I„.'3? , -"if^lio^drial matrix. The sequence of AAC has been obtained from various mammalian, 

plant and f ungal species. - The 2-oxoglutarate/malate carrier protein (OGCP), which exports 2-oxoglutarate into the 
cytosol and irnports malate or other dicarboxylic acids into the mitochondrial matrix. This protein plays an important 
role in several metabolic processes such as the malate/aspartate and the oxoglutarateAsocitrate shuttles - The phos- 
phate carrier protein, which transports phosphate groups from the cytosol into the mitochondrial matrix - The brown 
fat uncoupling protein (UCP) which dissipates oxidative energy into heat by transporting protons from the cytosol into 
the mrtochondnal matrix. - The tricarboxylate transport protein (or citrate transport protein) which is involved in citrate- 
H+^malate exchange. It is important for the bioenergetics of hepatic cells as it provides a carbon source for fatty acid 
and sterol biosyntheses. and NAD for the glycolytic pathway - The Grave's disease carrier protein (GDC) a protein 
! '9^ P«"«"«s with active Grave's disease. - Yeast mitochondrial proteins MRS3 

r„t?oI^r,th. ron °' '^^^^ "'^'^'"^ "^^^y ""PP^«^^ ^ mitochondrial splice defect in the first 

TZ mi!^h«?H ■^^"^ r T^- "''^'^'"^ «"PP^^«s°^ activity by modulating solute concentrations 

Lsl/nS r ''LX^)- - P'^^^*" ACR1 [6]. which seems 

essential fo acetyl-CoA synthetase activity. - Yeast protein PET8. - Yeast protein PMT - Yeast protein RIM2 - Yeast 

vcSlcJ^ii. ■ P"''®'" ^"^^^^ • ^^^^ P'°*«'" ^"^^2. - Yeast hypothetical proteins YBR291c, YEL006w 
YER053C. YFR045W. YHR002w. and YIL006w - Caenorhabditis elegans hypothetical protein K11 H3.3.Two other pro^ 

Imwlnr. «'h? ?" .*° ^S"^ '° "^'^ '^"""y- mitochondrial inner membrane: - Maize 

^nZT r J'J r^'" • ^" '"""'^ endosperm of kernels, could pby a role in amyloplast membrane 
transport. - Candda boidinii peroxisomal membrane protein PMP47 (7]. PMP47 is an integral membrane protein of the 
peroxisome and It may play a role as a transporter. These proteins all seem to be evolutionary related. Structurally 
they consistof three tandem repeats of a domain of approximately one hundred residues. Each of these domains 
contains two transmembrane regions. As a signature pattern, one of the most conserved regions in the repeated domain 
was selected, kxated just after the first transmembrane region. » k n 

[1036] Consensus pattem: P-x-[DE]-x-[LIVAT]-[RK]-x-[LRH]-[LIVMFY]-(QGAIVM]- 

( 1] Klingenberg M. Trends Biochem. Sci. 15:108-112(1990). 

( 2J Walker J.E. Curr. Opin. Struct. Biol. 2:519-526(1992). 

( 3] Kuan J., Saier M.H. Jr. CRC Crit. Rev. Biochem. 28:209-233(1993). 

( 4] Kuan J., Saier M.H. Jr. Res. Microbiol. 144:671-672(1993). 

[5] Nelson D.R., Lawson J.E.. Klingenberg M.. Douglas M.G. J. Mol. Biol. 2301159-1170(19931 
( 6] Palmieri F. FEBS Lett. 346:48-54(1 994). 

( 7] Jank B.. Habermann B.. Schweyen R.J.. Unk TA. Trends Bkxrfiem. Sci. 18:427-428(1993). 
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[1037] 359. Prokaiyotic molybdopterin oxidoreductases signatures (molybdopterin) 

A number of different prokaryotic oxidoreductases that require and bind anrwiybdopterin cofactor have been shown 
[1.2,3] to share a number of regions of sequence similarity. These enzymes are: - Escherichia coli respiratory nitrate 
reductase (EC 1.7.99.4). This enzyme complex allows the bacteria to use nitrate as an electron acceptor during anaer- 
obic growth. The enzyme is composed of three different chains: alpha, beta and gamma. The alpha chain (gene narG) 
IS the molybdopterin-binding subunlt. Escherichia coli encodes for a second, closely related, nitrate reductase complex 
which also contains a molybdopterin-binding alpha chain (gene narZ). - Escherichia coli anaerobic dimethyl sulfoxide 
reductase (DMSO reductase). DMSO reductase is the temiinal reductase during anaerobic growth on various sulfoxide 
and N-oxide compounds. DMSO reductase is composed of three chains: A. B and C. The A chain (gene dmsA) binds 
molybdopterin. - Escherichia coli biotin sulfoxide reductases (genes bisC and bisZ). This enzyme reduces a sponta- 
neous oxidation product of biotin. BDS. back to biotin. It may seive as a scavenger, allowing the cell to use biotin 
sulfoxide as a biotin source. - Methanobacterium formicicum fonnate dehydrogenase (EC 1.2.1.2 ). The alpha chain 
(gene fdhA) of this dimeric enzyme binds a molybdopterin cofactor. - Escherichia coli formate dehydrogenases -H 
(gene fdhF), -N (gene fdnG) and -O (gene fdoG). These enzymes are responsible for the oxidatran of formate to carbon 
dioxide. In addition to molybdopterin. the alpha (catalytic) subunit also contains an active site, selenocysteine - Wb- 
linella succinogenes polysulfide reductase chain. This enzyme is a component of the phosphorylative electron transport 
system with polysulfide as the temiinal acceptor It is composed of three chains: A. B and C. The A chain (gene psrA) 
binds molybdopterin. - Salmonella typhimurium thiosulfate reductase (gene phsA). - Escherichia coli trimethylamine- 
Noxide reductase (EC16JJ) (gene torA) [4]. - Nitrate reductase (EC 1.7.99.4 ) from Klebsiella pneumoniae (gene 
nasA), Alcahgeneseutrophus, Escherichia coli. Rhodobactersphaeroides, Thiosphaera pantotropha (gene napA) and 
Synechococcus PCC 7942 (gene narB).These proteins lange from 715 amino acids (fdhF) to 1 246 amino acids (rJarZ) 
insize. Three signature patterns for these enzymes were derived. The first is based on a conserved region in the N- 
terminal section and contains two cysteine residues perhaps involved in binding the molybdopterin cofactor It should 
be noted that this region is not present in bisC. The second pattem is derived from a conserved region located in the 
central part of these enzymes. 

[1038] Consensus pattem: [STAN]-x-[CH]-x(2,3)-C-[STAG]-[GSTVMF]-x-C-x-[LIVMFYW]-x-(LIVMA]-x(3,4)-(DEN- 
QKHT]- 

Consensus pattern: [STA]-x-[STAC](2)-x(2)-[STA]-D-[LIVMY](2)-L-P-x-[STAC](2)-x(2)-E- 

Consensus pattern: A-x(3)-[GDT]-l-x-[DNQTK]-x.[DEA]-x.[LIVM].x-[LIVMC]-x- [NS]-x(2)-[GSl-x(5).A-x.[LIVMHST]. 

[ 1] Wootton J.C., Nicolson R.E.. Cock J.M.. Walters D.E., Burke J.R. Doyle W.A., Bray R.C. Blochim Biophvs 
Acta 1057:157-185(1991). - p yo. 

[ 2] BItous RT., Cole ST., Anderson W.R. Wetner J.H. Mol. Microbiol. 2:785-795(1988) 
[ 3] Trieber C.A., Rothery R.A., Weiner J.H. J. Biol. Chem. 269:7103-7109(1994). 

[ 4] Mejean V.. Lobbi-Nivol C, Lepelletier M., Giordano G., Chippaux M.. Pascal M.-C. Mol. Microbiol. 11 :1 169-1179 
(1994). 

[1039] 360. Bacterial mutT donnain signature 

The bacterial mutT protein is involved in the GO system [1] responsible for removing an oxidatively damaged form of 
guanine (8-hydroxyguanine or7.8-dihydro-8-oxoguanine) from DNA and the nucleotide pool 8-oxokJGTP is inserted 
opposrte to dA and dC residues of template DNA with almost equal efficiency thus leading to A.T to G C transversions 
MutT specifically degrades SoxoKiGTP to the monophosphate with the concomitant release of pyrophosphate MutT 
IS a small protein of about 12 to 15 Kd. It has been shown [2.3] that a region of about 40 amino acid residues which 
IS found in the N-temriinal part of mutT. can also be found In a variety of other prokaryotic. viral, and eukaryotic proteins 
These proteins are: 

Streptomyces pneumoniae mutX 

- A mutT homolog from plasmid pSAM2 of Streptomyces ambofaclens. 
Bartonella bacillrformis invasion protein A (gene invA). 
Escherichia coli dATP pyrophosphohydrolase. 

Protein D250 from African swine fever viruses. 
• Proteins D9 and D10 from a variety of poxviruses. 

Mammalian 7,8-dihydro-8-oxoguanlne triphosphatase (EC 3.1 .6.-) [4]. 

- Mammalian diadenosine 5'.5"-P1 ,P4-tetraphosphate asymmetrical hydrolase (Ap4Aase) (EC 3611 7) f5I which 

cleaves A-5'-PPPP.5*A to yield AMP and ATR 

A protein encoded on the antisense RNA of the basic fibroblast growth factor gene in higher vertebrates 
Yeast protein YSA1. 

Escherichia coli hypothetical protein yfaO. 
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- Escherichia coli hypothetical protein ygdU and HI0901 . the corresponding Haemophilus influenzae protein 

- EscherichH coli hypothetical protein yjaD and HI0432. the corresponding Haemophilus influenzae protein ' 

- Escherichia coli hypothetical protein yrfE. 
Bacillus subtilis hypothetical protein yqkG. 

- Bacillus subtilis hypothetical protein yzgD. 

- Yeast hypothetical protein YGL067w. 

[1040] ltisproposed[2]thattheconsen/eddomaincouldbeinvolvedintheactivecenterofafamilyofpyrophosphate- 
releasing NTPases. As a signature pattern the core region of the domain was selected; it contains four conserved 
glutamate residues. 

[1041] Consensus pattern: G-x(5)-E-x(4)-[STAGC]-[LIVMAC]-x-R-E-[LIVMFT]-x-E-E- 

[1] Michaels M.L.. Miller J.H. J. Bacteriol. 174:6321-6325(1992). 
[2] Koonin E.V. Nucleic Acids Res. 21:4847-4347(1993). 

[3] Mejean V.. Salles C. Bullions M.J., Bessman M.J., Claverys J.-P. Mol. Microbiol 11-323-330(1994) 

[4] Sakumi K., FuruichI M.. Tsuzuki T. Kakuma T, Kawabata S., Maki H.. Sekiguchi M J Bfol Cham 268- 

23524-23530(1993). " 

(1995°'"^ ^ ' ^" - "^""^^C- Barraclough R.. McLennan AG. Blochem. J. 311:717-721 

[1042] 361. Myb DNA-binding domain repeat signatures 

The retroviral oncogene v-myb , and its cellular counterpart c-myb. encodenuclear DNA-binding proteins that specifi- 
cal^ recognize the sequence YAAC(Gn-)G [1]. The myb family also Includes the following proteins' - Drosophila D- 
rnyb [2]. - Vertebrate myb-IIke proteins A-myb and B-myb [3]. - Maize CI protein, a trans-acting factor which controls 
the expression of genes involved in anthocyanin bfosynthesis. - Maize P protein [4], a transacting factor which regulates 
the biosynthetic pathway of a flavonold-derlved pigment in certain floral tissues. - Arabldopsis thaliana protein GL1 15] 
required for the initiation of differentiation of leaf hair cells (trichomes). - A number of myb/cl-related proteins In maize 
and baMey. whose roles are not yet known [4]. - Yeast BAS1 [7], a transcriptional activator for the HIS4 gene - Yeast 
REB1 [8], which recognizes sites within both the enhancer and the promoter of rRNA transcription, as well as upstream 
of many genes transcribed by RNA polymerase II. - Fission yeast cdc5, a possible transcription factor whose activity 
IS required for cell cycle progressbn and growth during G2. - Fission yeast mybl. which regulates telomere length and 
function. - Yeast hypothetical protein YMR21 3wOne of the most consen/ed regions In all of these proteins is a domain 
of160amino acids. It consists of three tandem repeats of 51 to 53 amino acids. In myb, this repeat regbn has been 
shown [9] to be involved in DNA-binding. The major part of the first repeat is missing In retroviral v-myb sequences 
and in plant myb-related proteins. Yeast REB1 differs from the other proteins in this family in having a single myb-like 
domain^ As shown m the following schematic representation, two signature patterns for myb-IIke domains were devel- 
oped; the first IS located in the N-terminal section, the second spans the C-terminal extremity of the domain. 

xxxxxxxxxWxxxEDxxxxxxxxxxxxxx WxxIxxxxxxRxxxxxxxxWxxxx 
********************* ': Position of the patterns. 



[1043] Consensus pattem: W-[ST]-x(2)-E-[DE]-x(2)-[LIV]- 
Consensus pattern: W-x(2)-[LI]-[SAG]-x(4,5)-R-x(8)-[YW]-x(3)-[LIVMl- 

Note: this pattem detects the three copies of the domain in myb. d-myb. A-myb and B-myb; the second of the two 
complete copies of plant myb-related proteins, and the last two copies of yeast BAS1 

[ 1] Biednkapp H.. Borgmeyer U.. Sippel AE.. Klempnauer K.-H. Nature 335:835-837(1988). 

[ 2] Peters C.W.B.. Sippel AE.. Vingron M.. Klempnauer K.-H. EMBO J 6-3085-3090(1987) 

[3] Nomura N.. Takahashi M.. Matsui M.. Ishli S.. Date T, Sasamoto S.. Ishlzaki R. Nucleic Acids Res 16- 

11075-11090(1988). 

1 4] Grotewold E.. Athma R. Peterson T. Proc. Natl. Acad. Sci. U.S. A 88:4587-4591(1991) 

[ 5] Oppenheimer D.G., Hemian PL.. Sivakumaran S.. Esch J.. Marks M.D. Cell 67-4e3-493n9gn 

( 6] Marocco A. Wissenbach M.. Becker D., Paz-Ares J.. Saedler H.. Salamini F. Rohde W. Mol. Gen. Genet. 216: 

lo3-1 87(1 9S9). 

[ 7] Tics-Baldwin K.. Fink G.R, Amdt K.T Science 246:931-935(1989). 
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[ 8] Ju Q., Morrow B.E., Warner J.R. 1^1. CelK Biol. 10:5226-5234(1990). 
[ 9] Klempnauer K.-H., Sippel A.E. EMBO J. 6:2719-2725(1987). 

[1044] 362. NAD-dependent glycerol-3-phosphate dehydrogenase signature 

NAD-dependent glyceroI-3-phosphate dehydrogenase (EC 1. 1.1.8 ) (GPD) catalyzes the reversible reduction of dihy- 

droxyacetone phosphate to glycerol-3- phosphate. It is a eukaryotic cytosolic honnodimeric protein of about 40 Kd. As 

a signature pattern a glyclne-rich region that is probably [1] Involved in NAD-binding was selected. 

[1045] Consensus pattern: G-[ATHLIVM]-K-[DN]-(LIVM](2)-A-x-[GA]-x-G-[LIVMF]-x- [DE]-G-[LIVM]-x-[LIVMFYWl- 

G-x-N- 

[1046] [ 1] Otto J., Argos P., Rossnnann M.G. Eur. J. Biochem. 109:325-330(1980). 
[1047] 363. Nucleosome assembly protein (NAP) 

[1048] It is thought that NAPs may be involved in regulating gene expression as a result of histone accessibility [1]. 

[1] Rodriguez P, Munroe D, Prawitt D, Chu LL, Brie E, Kim J. Reid LH, Davies C, Nakagama H, Loebbert R, 
Winterpacht A, Petruzzi MJ, HIggins MJ, Nowak N, Evans G. Shows T, Weissman BE. Zabel B. Housman De' 
Pelletier J, Genomics 1 997;44:253-265. 

[2] Schnieders R Dork T. Arnemann J. Vogel T. Werner M, Schmidtke J; Hum Mol Genet 1 996;5:1801 -1807. 

[1049] 364. NB-ARC domain 

van der Biezen EA. Jones JD, Gurr Biol 1998;8:226-227. 
[1050] 365, Nucleoside diphosphate kinases active site 

[1051] Nucleoside diphosphate kinases (EC 2.7.4.6 ) (NDK) [1] are enzymes required for the synthesis of nucleoside 
triphosphates (NTP) other than ATP They provide NTPs for nucleic acid synthesis, CTP for lipid synthesis, UTP for 
polysaccharide synthesis and GTP for protein elongation, signal transduction and microtubule polymerization. In eu- 
karyotes, there seems to be a small family of NDK isozymes each of which acts in a different subcellular compartment 
and/or has a distinct biological function. Eukaryotic NDK isozymes are hexamers of two highly related chains (Aand 
B) [2]. By random association (A6, A5B.. .AB5, B6). these two kinds of chain form isoenzymes differing in their isoelectric 
point. NDK are proteins of 17 Kd that act via a ping-pong mechanism in which a histidine residue is phosphorylated, 
by transfer of the terminal phosphate group from ATP In the presence of magnesium, the phosphoenzyme can transfer 
its phosphate group to any NDP. to produce an NTPNDK isozymes have been sequenced from prokaryotic and eu- 
karyotic sources. It has also been shown [3] that the Drosophila awd (abnormal wing discs) protein. Is a microtubule- 
associated NDK. Mammalian NDK is also known as metastasis inhibition factor nm23.The sequence of NDK has been 
highly conserved through evolution. There is a single histidine residue conserved in all known NDK isozymes, which 
is involved in the catalytic mechanism [2]. Our signature pattem contains this residue. 
[1052] Consensus pattem: N-x(2)-H-[GA]-S-D-[SA]-[LIVMPKNE] [H is the putative active site residuej- 

[ 1] Parks R., Aganwal R. (In) The Enzymes (3rd edition) 8:307-334(1973). 

[ 2] Gllles A.-t\A., Presecan E., Vonica A„ Lascu I. J. Biol. Chem. 266:8784-8789(1991). 

[ 3] Biggs J., Hersperger E.. Steeg PS., Liotta LA.. Shearn A. Cell 63: 933-940f 1990). 

[1053] 366. Nitrite and sulfite reductases iron-sulfur/siroheme-binding site (NIR_SIR) Nitrite reductases (NiR) [1] 
catalyze the reduction of nitrite into ammonium, the second step in the assimilation of nitrate. There are two types of 
NiR: the higher plant chtoroplastic form of NiR (EC 1.7.7.1 ) is a monomeric protein that uses reduced ferredoxin as 
the electron donor; while fungal and bacterial NiR (EC 1.6.6.4 ) are homodimeric proteins that uses NAD(P)H as the 
electron donor Both fornr^ of NiR contain a siroheme-Fe and iron-sulfur centers. Sulfite reductase (NADPH) (EC 
UJJ) (SIR) [2] is the bacterial enzyme that catalyzes the reduction of sulfite to sulfide. SIR is an oligomeric enzyme 
with a subunit composition of alpha(8)-beta(4). the alpha component is a flavoprotein (SIR-FP), while the beta com- 
ponent is a slroheme. iron-sulfurprotein (SIR-HP).Sulfite reductase (fen-edoxin) (EC 1.8.7.1 ) [3] is a cyanobacterial 
and plant monomeric enzyme that also catalyzes the reduction of sulfite to sulfide. Anaerobic sulfite reductase (EC 
1.8.1.-) (ASR) [4], a bacterial enzyme that catalyzes the NADH-dependent reduction of sulfite to sulfide. ASR is an 
oligOTeric enzyme composed of three different subunits. The C component (geneasrC) seems to be a siroheme, iron- 
sulfur protein. These enzymes share a regbn of sequence similarity in their C-terminal half; this region whbh spans 
about 80 amino acids includes four conserved cysteine residues. Two of the Cys are grouped together at the beginning 
of the domain, and the two others are grouped in the middle of the detain. The cysteines are involved in the binding 
of the iron-sulfur center; the last c^e also binds the siroheme group [2]. A signature pattem from the region around the 
second cluster of cysteines was derived. 

[1 054] Cc^sensus pattem: [STV].G.C-x(3)-C-x(6)-[DE]-(U VMF]-[G AT]-[U\^F] [The two C's are ison-sulf ur ligands]- 
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[ 1] Campbell W.H., KInghorn J R. Trends Biochem. Sci. 15:315-319(1990) 
[ 2] Crane B.R., Slegel LM.. Getzotf E.D. Science 270: 59-67n 995V 

[ 3] Gisselmann G.. Klausmeier P.. Schwann J.D. Blochlm. Biophys. Acta 1144 102-106(1993) 
[4] Huang C.J.. Barrett E.L. J. Bacterid. 173:1544-1553(1991). 

[1055] 367. (NMT) Myretoyl-CoA:protein N-myristoyllransferase signatures Myristoyl-CoA- 

(EC 2^) (Nmy [1] is the enzyme responsible for transferring a myristate group on 

60 Kdlr« ^ ""'^ P^°*^'"^ N'"' ^ monomeric protein of about ^ to 

oLlT^ sequence appears to be well conserved. Two high^ consented regions have been developed as signature 
patterns. The first one is located in the central section, the second in the C-terminal part 
[1056] Consensus pattern: E-l-N-F-L-C-x-H-K- 
Consensus pattern: K-F-G-x-G-D-G- 

!I«f 2 Lll""'^"''"' '^'^Wherter C.A.. Gokel G.W., Gordon J.I. Adv. Enzymol. 67:375-430(1 993) 
[1 058] 368. ADP-glucose py rophosphorylase signatures (NTPjransf erase) 

[1059] ADP-glucose pyrophosphorylase (glucose-1 -phosphate adenylyltransferase) 11 .2J(EC 2 7 7 27) catalvzes a 
very important step iri the biosynthesis of alpha 1,4-glucans (glycogen or starch) in bacter a anS^sSJof 
he act«,ated giucosy donor. ADP-glucose, from glucose-1 -phosphate and ATR ADP-glucose pyrophosphorl asTfs a 
tTZ?"^ \TT ' ^ ^^'^^^'^^-^ - ''-<=teria while in plant chloropLts an? amySra ts 
the tr^ZT fr. T Tr'' ^^^'"•'^"^'y subunrts. There are a number of conserved regions in 

the sequence of bactenal and plant ADP-glucose pyrophosphorylase subunits. Three of these regions were selected 

[1060] Consensus pattern: [AGJ-G-G-x-G-[STK]-x-L-x(2)-L-[TA]-x(3)-A-x-P-A-[LV]- 

Consensus pattern: W-[FY]-x-G-[ST]-A-[DNSH]-[AS]-[LIVMFYW]- 

Consensus pattern: IAPV]-[GS]-M-G-[LIVMN]-Y-[IVCl-[LIVMFY]-x(2)-[DENPHK]- 

j^l^Nakata P.A.. Greene TW., Anderson J.l^., Smith-White B.J., OkitaTW. Preiss J. Plant Mol. Biol. 17:1089-1093 

[ 2] Preiss J.. Ball K.. Hutney J., Smith-White B.J.. Li. L, Okitsa TW. Pure Appl. Chem. 63:535-544(1991). 

[1061] 369. Sodium/hydrogen exchanger family 

[1062] Na/H antiporters are key transporters in maintaining the pH of actively metabolizing cells The molecular 
mechanisms of antiport are unclear. ^ la- me moiecuiar 

thT^rr^'T.''^'" ^^^"^'"embrane regions (M) at the amino-terminus and a largo cytoplasmic regton at 

the ca*oxyl terminus. The transmembrane regions M3-M12 share identity wrth other members of the fami^. The M6 
and IldZn"r'° h-ghly cor,sen,ed. Thus, this is thought to be the region that is Involved in the transport of sodium 
and hydrogen ions. The cytoplasmic region has little similarity throughout the family. 

2^LiSrs!SZl!^'Tl 1998;424:1-5. [2] Orlowski J. Grinstein 8; J Biol Chem 1997;272: 

22373-22376.[3] Numata M. Petrecca K, Lake N, Orlowski J; J Biol Chem 1 998;273:6951 -6959 
[1 064] 370. Sodium:sulfate symporter family signature (Na_sulph_symp) 

Integral membrane proteins that mediate the intake of a wide variety of molecules with the concomitant uptake of 

l^^r^lt r M . T'"!' '^"''^""^ °* '°"°*'"9 P^°»«'"«: ■ Mammalian sodium/sulfate 

cotransporter [1]. - Mammalian renal sodium/dicarboxylate cotransporter [2], which transports succinate and citrate - 
Mammalian intestinal sodium/dicarboxylate cotransporter. - Chlamydomoias reinhardtii putative surdeprtattn Re- 
sponse regulator SAC1 [3], - Caenorhabdrtis elegans hypothetical proteins B0285.6. F31 F6.6. K08E5 2 and m07 l 
It D^^co^l.^'^^^'^' • H«e'^°Pf'""s influenzae hypothetical protein HI0608. - Synechocystis 

strain PCC 6803 hypothetical protein sll0640. - Methanococcus jannaschii hypothetical protein MJ0672 liTseTrS^ 
TZZTr" °' T '° ''''' ^''^^ ^'9hly hydrophobic and ihich probab^ contain aZ 

lln^Z7^ '"f""- ^ ^ "^"^'"'^ P^"^'"' ^ ^^9'°" ««'«=«e<l «^*ch is located in or near the 

penultimate transmembrane regton. 

[106S] Consensus pattern: [STACPJ-S-x(2)-F-x(2)-P-ILIVM]-[GSA]-x(3)-N-x-[LIVMhV- 

1 1] Markovich D.. Forgo J.. Stange G.. Biber J.. Murer H. Proc. Natl. Acad. Sci. U.S.A. 90:8073-8077(1 993) 
[ 2] Pajor A.M. Am. J. Physiol. 270:642-648(1996). ^ftv^^J) 
1 3] Davies J.P. Yildiz FH., Grossman A. EMBO J. 15:2150-2159(1996). 
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[1066] 371. NifU-like domain 

[1067] This is an alignment of the carljoxy-temriinal domain. This is the only common region between the NifU protein 
from nitrogen-rocing bacteria and rhodobacterial species. The biochemical function of NitU is unknown [1] 
[1068] Ouzounis C, Bork P. Sander C. Trends Biochem Sci 1994;19:199-200. 
[1069] 372. Nitrilases / cyanide hydratase signatures 

Nitrilases (EC 3&5J) are enzymes that convert nitriles into their corresponding acids and ammonia. They are wide- 
spread in microbes as well as in plants where they convert indole-3-acetonitrile to the homione indole-3-acetic acid 
A consented cysteine has been shown [1 ,2] to be essential for enzyme activity; it seems to be involved in a nucleophilic 
attack on the nitrile carbon atom. Cyanide hydratase (EC 4.2.1.66 ) converts HCN to formamide. In phytopathogenic 
fungi, ft IS used to avoid the toxic effect of cyanide released by wounded plants [3]. The sequence of cyanide hydrolase 
IS evolutionary related to that of nitrilases. Yeast hypothetical proteins YIL164C and YIL165c also belong to this famiV. 
As signature pattems for these enzymes, two conserved regions were selected. The first is located in the N-terminal 
section while the second, which contains the active site cysteine, is located in the central section 
[1 070] Consensus pattem: G-x(2)-[LI VMFY](2)-x-[l F]-x-E-x(2)-(LI VM]-x-G-Y-P- 

Consensus pattem; G-[GAQ]-x(2)-C-[WA]-E-[NH]-x(2HPST]-[LIVMFYS]-x-[KR] [C is the active site reskiue]- 

[ 1] Kobayashi M., Izui H., Nagasawa T, Yamada H. Proc. Natl. Acad. Sci. U.S.A. 90:247-251(1993) 

[ 2] Kobayashi M., Komeda H., Yanaka N., Nagasawa T, Yamada H. J. Biol. Chem. 267:20746-20751(1992) 

[ 3] Wang P. Vanetten H.D. Biochem. Biophys. Res. Commun. 187:1048-1054(1992). 

[1071] 373 NusB family 

[1072] The NusB protein is involved in the regulatbn of rRNA biosynthesis by transcriptional antitermination 
[1073] Huenges M, Rolz C, Gschwind R, Peterandert R. Berglechner F. Richter G, Bacher A, Kessler H.Gemmecker 
G, EMBO J 1998;17:4092-4100. 

[1074] 374. (Neur Chan) Neurotransmitter-gated ion-channels signature 

Neurotransmitter-gated ion-channels [1.2,3,4] provide the molecular basis for rapid signal transmission at chemical 
synapses. They are post-synapticoligomeric transmembrane complexes that transiently form a ionic channel upon the 
binding of a specific neurotransmitter Presently, the sequence of subunits from five types of neurotransmitter-gated 
receptors are known: - The nicotinic acetylcholine receptor (AchR), an excitatory cation channel. In the motor endplates 
of vertebrates, it is composed of four different subunits (alpha, beta, gamma and delta or epsilon) wtth a molar stoichi- 
onietry o 2: 1 :1 :1 . In neurones, the AchR receptor is composed of two different types of subunits: alpha and non-alpha 
(also called beta). Nicotinic AchRs are also found In invertebrates. - The glycine receptor, an inhibitory chloride ion 
Channel. The glycine receptor is a pentamer composed of two different subunits (alpha and beta) - The gamma- 
arninobutyric-acid (GABA) receptor, which is also an inhibitory chloride ion channel. The quaternaiy structure of the 
GABA receptor is complex; at least four classes of subunits are known to exist (alpha, beta, gamma, and delta) and 
there are nrrany variants in each class (for example: six variants of the alpha class have already been sequenced) - 
The serotonin 5HT3 receptor Serotonin is a biogenic hormone that functions as a neurotransmitter, a hormone and a 
mitogen. There are seven major groups of serotonin receptors; six of these groups (5HT1 5HT2 and 5HT4 to 5HT7) 
transduce extracellular signal by activating G proteins, while 5HT3 Is a ligand-gated cation-specific ion channel which 
when activated causes fast, depolarizing responses in neurons. - The glutamate receptor, an excitatory catfon channel' 
Glutamate is the mam excitatory neurotransmitter in the brain. At least three different types of glutamate receptors 
have been descnbed and are named according to their selective agonists (kainate, N-methyl-D-aspartate (NMDA) and 
quisqualate).AII known sequences of subunits from neurotransmitter-gated ron-channels are structurally related They 
are composed of a large extracellular glycosylated N-terminal ligand-binding domain, followed by three hydrophobic 
transmembrane regions which form the ionic channel, followed by an intracellular region of variable length A fourth 
hydrophobic region Is found at the C-terminal of the sequence. The sequence of subunits from the AchR GABA 5HT3 
and Gly receptors are clearly evolutionary related and share many regions of sequence similarities. These sequence 
similarities are either absent or very weak in the Glu receptors. In the N-temiinal extracellular domain of AchR/G ABA/ 
5HT3/Gly receptors, there are two consen/ed cysteine residues, which, in AchR. have been shown to form a disulfide 
bond essential to the tertiary structure of the receptor A number of amino acids between the two disulfide-bonded 
cysteines are also consen/ed. Therefore this region was used as a signature pattem for this subclass of proteins 
[1075] Consensus pattem: C-x-(LIVMFQJ-x-[LIVMF]-x(2)-(FY]-P-x-D-x(3)-C [The two C's are linked by a disulfide 
bona]- 



[ 1] Stroud R.M.. McCarthy M.R, ShusterM. Biochemistry 29:11009-11023(1990) 
[ 2] Betz H. Neuron 5:383-392(1990). 

[ 3] Dingledine R.. Myers S.J., Nicholas R.A. FASEB J. 4:2632-2645(1990). 
[ 4] Barnard E.A. Trends Biochem. Sci. 17:368-374(1992). 
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[107q 375. Orotidine 5'-phosphat9 decarboxylase active site 

?t!^^nt^T°^^*'^'Tf'^!!!^^'^ (EC 4JJJ3) (OMPdecase) (1 .2] catalyzes the last step in the de novo biosyn- 
thesis of pynmidines the decarboxylation of OMP into UMR In higher eukaryotes OMPdecase is part with orS- 
phosphonbosyftransferase.ofabifunctionalenzyrne.«rf,ilefheprote,yoticandLgalOMPd^^^^ 
proteoi. Some parts of the sequence of OMPdecase are well conserved across species. The bLTconZed re^^^^^^ 
ocat^.n the N-term,nal half of OMPdecases and is centered arounda ^sine residue which is sssenSrthe SJt c 
function of the enzyme. This region has been developed as a signature pattern 

[1077] Consensus pattern: [LIVMFTA]-[LIVMF]-x-D-x-K-x(2)-D-l-[GP]-x-T.[LIVMTAl (K is the active site residue]- 

( 1] Jacquet M., Guilbaud R, Garreau H. Mol. Gen. Genet. 211:441-445(1988) 
[ 2] Kimsey H.H.. Kaiser D. J. Biol. Chem. 267:819-824(1992). 

[1 078] 376. ATP synthase delta (OSCP) subunit signature 

bacteria. the inne membrane of mitochondna. and the thylakoid membrane of chloroplasts. The ATPase complex is 

t=:,irgS:sr~^^""""^ 

?r„°«r '.J,^^"''""''^, f 'he ATPase complex, known as subunit delta in bacteria and chloroplasts or the Oligomycin 
Sensrt«.rty Conferral Protein (OSCP) in mitochondria, seems to be part of the stalk that links CF(0) to CF(T i°E 
Uansmits confomiational changes from CF(0) Into CF(1) or is involved in proton conduction [3] 
The drfferent delta^SCP subunits are proteins of approximately 200 amino-acid residues - once the transit peptide 
has been removed ,n the chloroplast and mitochondrial fomis - which show only moderate sequence homotT 

s^^reTprinr"'"^^^ 

[UVMHkSq]"(G?^^^^^^ f'-IVM,-x-fLIVMFYT]-x(3)-(LIVMT]-[DENQK]-x(2)-[LIVM].x-[GSA]-G-[LIVMFYGAl.x- 

[ 1] Futai M.. Noumi T, Maeda M. Annu. Rev Biochem. 58:111-136(1989) 
[ 2] Senior A.E. Physiol. Rev. 68: 1 77-231 (1 988). 

[ 3] Engelbrecht S., Junge W. Bkichim. Biophys. Acta 1015:379-390(1990). 
[1080] 377. Aspartate and ornithine carbamoyltransferases signature 

^hS^L'^^T^"'^^ f''f ^ '^^^ '^^^'y^^^ '^'^^^^si^" °' aspartate and carbamoyl phos- 

phate to carbarrioybsparta^e. the second step in the de novo biosynthesis of pyrimidine nucleotides [1] In proterJoSs 
ATCase consists of two subunits: a catalytic chain (gene pyrB) and a regulatory chain (gene pyri). Jhi e in euSoS 

r2mhatTr r' T""' ^^^''^ ^^^^ y«-«- -^--taS; m Driophlla'aid CAOrm^^^^ 
[2]) that also catalyzes other steps of the biosynthesis of pyrimidines 

toci?^nri^'™rrl'I^^^^^ 

r participates in the urea cycle [3] and is located in the mitochondrLl mat'bc n 

prokaryotes and eukaryotic microorganisms it is involved in the biosynthesis of arginine. In some bacterial specL U 
s also involved in the degradatbn of arginine [4] (the arginine deaminase pathway) ^ 

Ln~r '"T ^^^^.K '"^"^ ^"^""^^ evolutionaiy related. TTie predicted secondary structure of both 
enzymes are similar and tjere are some regions of sequence similarities. One of these regions includes three residu^ 
ptihZ " • ' ^^^•^"^^^^^"^ t«J' '° -P'-ted in binding the phosphoryl group of caZoyl 
This region was selected as a signature for these enzymes 

Consensus pattern: F-x-[EK]-x-S-[GT].R-T[S. R. and the 2nd T bind carbamoyl phosphate] 

-Note: the residue in position 3 of the pattern allows to distinguish between an ATCase (Glu) and an OTCase (Lys). 

[ 1] Lerner C.G.. Switzer R.L J. Biol. Chem. 261:11156-11165(1986) 

1 2] Davklson Chen K.C.. Jamison R.S.. Musmanno LA.. Kem C.B. BioEssays 15:157-164(1993) 

[ 3] Takiguchi M., Matsubasa T. Amaya Y.. Mori M. BioEssays 10:163-166(1989) 

1 4] Baur H.. Stalon V.. Falmagne R. Luethi E.. Haas D. Eur. J. Biochem. 166 Hl'-ll7(1987) 

6 ^rTri^ ' ^.^i"L°■^• ^^.A.. WIW J.R. Proc. Nan. Acad. Sci. U.S.A. 81:4864-4868(1981) 

[ 6] Ke H.-M., Honzatko R.B.. Lipscomb W.N. Proc. Natl. Acad. Sci. U.S.A. 81 :4037-4040(1 984). 

[1081] 378. Oleosins signature 

Oleosins [1] are the proteinaceous components of plants" lipid storage bodies called oil bodies. Oil bodies are small 
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droplets (0.2 to 1.5 mu-m in diameter) containing mostty triacylglycerol that are surrounded by a phospholipid/oleostn 
annulus. Oleosins may have a stmctural role in stabifeing the lipid body during dessication of the seed, by preventing 
coalescence of the oil. They may also provide recognition signals for specific lipase anchorage in lipolysis during 
seedling growth. Oleosins are found in the monolayer lipid/ water interface of oil bodies and probably interact with both 
the lipid and phospholipid moieties. 

Oleosins are proteins of 16 Kd to 24 Kd and are composed of three domains: an N4erminal hydrophilic region of 
variable length (from 30 to 60 residues); a central hydrophobic domain of about 70 residues and a C-terminal amphip- 
athic region of variable length (from 60 to 100 residues). The central hydrophobic domain is proposed to be made up 
of beta-strand structure and to interact with the lipids [2]. It is the only domain whose sequence is conserved and 
therefore a section from that domain was selected as a signature pattern. 

[1082] Cc^sensus pattem: [AG]^ST]-x(2)-[AG]-x(2).[UV^/q^SADI-T-P.[LIVMF](4)-F-S-P-[LIVM](3)-P-A 

[ 1] Murphy D.J.. Keen J.N.. O'Sullivan J.N.. Au D.M.Y.. Edwards E.-W.. Jackson PJ.. Cummins L, Gibbons T, 

Shaw C.H.. Ryan A. J. Biochim. Bbphys. Acta 1088:86-94(1991). 

[ 2] Tzen J.TC, Lie G.C., Huang A.H.C. J. Biol. Chem. 267:15626-15634(1992). 

[1083] 379. (Orbi VPS) Orbivirus outer capsid protein VPS 

[1084] This paper shows the location of the different capsid proteins and their relation to each other. 
[1085] [1] Schoehn G, Moss SR, Nuttall PA. Hewat EA; Virology 1997;235:191-200. 
[1086] 380. Orn/DAP/Arg decarboxylases family 2 signatures 

Pyridoxal-dependent decarboxylases acting on ornithine, lysine, arginine and related substrates can be classified into 
two different families on the basis of sequence similarities [1 .2,3]. The second family consists of: 

- Eukaryotic ornithine decarboxylase (EC 4. 1 . 1 . 1 7) (ODC). ODC catalyzes the transformation of omithine into pu- 
trescine. 

- Prokaryotic diaminopimelic acid decarboxylase (EC 4. 1 . 1 .20) (DAPDC). DAPDC catalyzes the conversion of di- 
aminopimelic acid into lysine; the last step in the biosynthesis of lysine. 

- Pseudomonas syringae pv tabaci protein tabA. tabA is probably involved in the biosynthesis of tabtoxin and is 
highly similar to DAPDC. 

- Bacterial and plant biosynthetic arginine decarboxylase (EC 4.1.1.19) (ADC). ADC catalyzes the transformation 
of arginine into agmatine. the first step in the biosynthesis of putrescine from arginine. 

The above proteins, while most probably evolutionary related, do not share extensive regions of sequence similarities. 
Two of the conserved regions were selected as signature pattems. The first pattem contains a conserved lysine residue 
which is known, in mouse ODC [4], to be the site of attachment of the pyridoxal-phosphate group. The second pattern 
contains a stretch of three consecutive glycine residues and has been proposed to be part of a substrate-binding region 
[5]. 

These enzymes are collectiveiy known as group IV decarboxylases [3]. 

[1087] Consensus pattem: [FY]-[PA]-x-K-[SACV]-[NHCLFW]-x(4HLIVMF]-[LIVMTA]-x(2)-[LIVMA]-x(3)-[GTE] [K is 
the pyridoxal-P attachment site] 

Consensus pattem: [GS]-x(2,6)-[LIVMSCP]-x(2)-[LIVMF]- [DNS]-[LIVMCA]-G-G-G-[LIVMFY].[GSTPCEQ] 
[ 1] Bairoch A. Unpublished obsen/ations (1993). 

[ 2] Martin C. Cami B., Yeh P, Stragier P, Parsot C, Patte J.-C. Mol. Biol. Evol. 5:549-559(1988). 

[ 3] Sandmeier E., Hale Tl., Christen P Eur. J. Biochem. 221:997-1002(1994). 

[ 4] Poulin R.. Lu L, Ackermann B.. Bey P. Pegg A.E. J. Biol. Chem. 267:150-158(1992). 

[ 5] Moore R.C.. Boyle S.M. J. Bacteriol. 172:4631-4640(1990). 

[1088] 381 . Osteopontin signature 

Osteopontin is an acidic phosphorylated glycoprotein of about 40 Kd which is abundant in the mineral matrix of bones 
and which binds tightly to hydroxyapatite [1,2,3]. It is suggested that osteopontin might function as a cell attachment 
factor and couW play a key role in the adhesion of osteoclasts to the mineral matrix of bone. 
Osteopontin-K is a kidney protein which is highly similar to osteopontin and probably also involved in cell-adhesion. 
As a signature pattern a highly conserved region located at the N-temiinal extremity of the mature protein was selected 
[1089] Ccrtsensus pattem: [KQ]-x-[TA]-x(2)-[GA]-S-S-E-E-K 

[ 1] Butler W.T Connect. Tissue Res. 23:123-36(1989). 
( 2] Gorski J.P Calcif. Tissue Int. 50:391-396(1992), 
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( 3] Denhatdl D.T.. Guo X. FASEB J. 7:1475-1482(1993). 
[1090] 382. Oxysterol-binding protein family signature 

■ SfwwS:^ """"" ° ■ '<-^ -^y^ P-.»l" 

- Yeast hypothetical protein YHR073w (996 residues). 

- Yeast hypothetical protein YKR003w (448 residues) 

[1092] Consensus pattern: E-[KQ]-x-S-H-[HR]-P-P-x-[STACF]-A peniapepi.ae. 

[1095] 384. Oxidoreductase FAD/NAD-bindIng domain 
Number of members: 250 
HI 

Medline: 92084635 

Hyde G E, Crawford NM. Campbell W; 
J Biol Chem 1991;266:23542-23547. 
[2]Medline; 95111952 

Lu G, Campbell WH. Schneider G. Lindqvist Y; 
Structure 1994;2:809-821 

5J==— ^^^^^^^ 

[1099] 386. (Oxidored ql) NADH-Ubiquinone/plastoquinone (complex I), various chains 



158 



EP1 033 405A2 



This famiry is part of complex I which catalyses the transfer of two electrons from NADH to ubiquinone in a reaction 
that is associated with proton translocation across the membrane. Number of members' 1824 
[1] 

Medline: 93110040 

5 The NADH: ubiquinone oxidoreductase (complex I) of respiratory chains. V\telker JE; 
Q Rev Biophys 1 992;25:253-324. 
[1100] 387. (oxidored q3) NADH-ubiquinone/plastoquinone oxidoreductase chain 6. 179 members. 
[1101] 388. (oxidored q5) NADH-ubiquinone oxidoreductase chain 4, amino terminus 
[1102] [1] Walker JE ; Q Rev Biophys 1992;25:253-324. 

10 [1103] 389. (oxidored q6) Respiratory-chain NADH dehydrogenase 20 Kd subunit signature Respiratory^Jhain NADH 
dehydrogenase (EC 1.6.5.3) [1.2] (also known as complex I or NADH-ubiquinone oxidoreductase) is an oligomeric 
enzymatic complex located in the inner mitochondrial membrane which also seems to exist in the chloroplast and in 
cyanobacteria (as a NADH-plastoquinone oxidoreductase). Among the 25 to 30 polypeptide subunits of this bioener- 
getic enzyme complex there is one with a molecular weight of 20 Kd (in mammals) [3], which is a component of the 

IS iron-sulfur (IP) fragment of the enzyme. It seems to bind a 4Fe-4S iron-sulfur cluster. The 20 Kd subunit has been 
found to be: 

Nuclear encoded, as a precursor form with a transit peptide in mammals, and in Neurosporacrassa. - Mitochondrial 
encoded in Paramecium (gene psbG). 
20 - Chloroplast encoded in various higher plants (gene ndhK or psbG). 

The 20 Kd subunit is highly similar to [4]: 

- Synechocystis strain PCC 6803 proteins psbG1 and psbG2. 

2S - Subunit B of Escherichia coli NADH-ubiquinone oxidoreductase (gene nuoB). 

Subunit NCXD6 of Paracoccus denitrificans NADH-ubiquinone oxidoreductase. 

- Subunit 7 of Escherichia coli fomriate hydrogenlyase (gene hycG). 
Subunit I of Escherichia coli hydrogenase-4 (gene hyfl). 

30 As as signature pattern a highly conserved region was selected, located in the central section of this subunit and which 
contains a conserved cysteine that is probably involved in the binding of the 4Fe-4S center 

[1104] Consensus pattern: [GN]-x-D-[EAST]-[LIVMF](2)-P-[IV]-D-[LIVMFYW](2)-x-P-x-C-P-[PT] [The C is a putative 
4Fe-4S ligand] 

35 [1] Ragan CI. Curr. Top. Bioenerg. 15:1-36(1987). 

[ 2] Weiss H., Friedrich T, Hofhaus G., Preis D. Eur. J. Biochem. 197:563-576(1991). 

[ 3] Arizmendi J.M., Runswick M.J., Skehel J.M.. Walker J.E. FEBS Lett. 301:237-242(1992). 

[ 4] Weidner U., Geier S., Rock A., Friedrich T, Leif H., Weiss H. J. Mol. Biol. 233:109-122(1993). 

"^0 [1 1 05] 390. p53 tumor antigen signature 

The p53 tunrK)r antigen [1 to 5, E1,E2] is a protein found in increased amounts in a wide variety of transformed cells. 
It is also detectable in many proliferating nontransfomned cells, but it is undetectable or present at low levels in resting 
cells. It is frequently mutated or inactivated in many types of cancer p53 seems to act as a tumor suppressor in some, 
but probably not all, tumor types. p53 is probably involved in cell cycle regulation, and may be a trans^activator that 
acts to negatively regulate cellular division by controlling a set of genes required for this process. 
p53 is a phosphoprotein of about 390 amino acids which can be subdivided into four domains: a highly charged acidic 
region of about 75 to 80 residues, a hydrophobic proline-rich domain (position 80 to 150), a central region (from 150 
to about 300), and a highly basic C-terminal region. The sequence of p53 is well conserved in vertebrate species; 
attempts to identify p53 in other eukaryotic philum has so far been unsuccessful. 

As a signature pattern for p53 a perfectly conserved stretch of 1 3 residues located in the central region of the protein 
was selected. This region, known as domain IV in [3], is involved (along with an adjacent region) in the binding of the 
large T antigen of SV40. In man this regies is the focus of a variety of point mutations in cancerous tumors. 
[1106] Consensus pattern: M-C-N-S-S-C-M-G-G-M-N-R-R 

55 [ 1] Levine A.J., Momand J., Finlay C.A. Nature 351:453-456(1991). 

[ 2] Levine AJ., Momand J. Biochim. Biophys. Acta 1032:119-136(1990). 
[ 3] Soussi T, Caron De Fromentel C. May R Oncogene 5:945-952(1990). 
[ 4] Lane D.P., Benchimol S. Genes Dev 4:1-8(1990). 
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( 5] Ulrich S.J,. Anderson C.W„ Mercer W.E.. Appella E. J. Biol. Chem. 267:15259-15262(1992). 
[1107] 391. (P5CR) Delta 1 -pyrroline-5-carboxylate reductase signature 

Delta 1-pyrroline-5-carboxylate reductase (P5CR) (EC 1.5.1.2) [1,2] is the enzyme that catalyzes the terminal step in 
the biosynthesis of proline from glutamate, the NAD(P) dependent oxidation of 1 -pyrroline-5-carboxylate into proline. 
The sequences of P5CR from euljacteria (gene proC). archaebacteria and eukaryotes show only a moderate level of 
overall similarity. As a signature pattern, the best conserved region located in the C-terminal section of P5CR was 
selected. 

[1108] Consensus pattern: (PALn-x(2.3)-[LIV]-x(3)-[LIVMl-[STAC]-[STV]-x-[GAN]-G-x-T-x(2)-[AGl-[U 
[LMFHDENQK] 

[ 1] Delauney A. J., Verma D.P. Mol. Gen. Genet. 221:299-305(1990). 

[ 2] Savioz A., Jeenes D.J., Kocher H.P., Haas D. Gene 86:107-111 (1990). 

[1109] 392. Poly-adenylate binding protein, unique domain. 

[1110] 393. (PAL) Phenylalanine and histidine ammonia-lyases active site 

Phenylalanine ammonia-tyase (EC 4.3.1.5) (PAL) Is a key enzyme of plant and fungi phenylpropanoid metabolism 
which is involved in the biosynthesis of a wide variety of secondary metabolites such as flavanoids, furanocoumarin 
phytoalexins and cell wall components. These compounds have many important roles in plants during normal growth 
and in responses to environmental stress. PAL catalyzes the removal of an ammonia group from phenylalanine to form 
trans-cinnamate. 

Histidine ammonla-lyase (EC 4.3.1.3) (histidase) catalyzes the first step in histidine degradatbn, the removal of an 
ammonia group from histidine to produce urocanic acid. 

The two types of enzymes are functionally and structurally related [1]. They are the only enzymes which are known to 
have the modified amino acid dehydroalanine (DHA) in their active site. A serine residue has been shown [2,3,4] to be 
the precursor of this essential electrophilic moiety. The region around this active site residue is well consen/ed and 
can be used as a signature pattern. 

[1111] Consensus pattern: G-[STG]-[LIVM]-[STG]-[AC]-S-G-[DH]-L-x-P-L-[SA]-x(2)-[SA] (S is the active site residue] 

[ 1] Taylor R.G.. Lambert M.A., Sexsmith E., Sadler S.J., Ray PN.. Mahuran D.J., Mclnnes R.R. J. Biol. Chem 
265:18192-18199(1990). 

[ 2] Langer M.. Reck G.. Reed J.. Retey J. Biochemistry 33:6462-6467(1994). 

[ 3] Schuster B., Retey J. FEBS Lett. 349:252-254(1994). 

[ 4] Taylor R.G.. Mclnnes R.R. J. Biol. Chem. 269:27473-27477(1994). 

[1112] 394. PAS domain 

-I- CAUTION. This family does not currently match all known examples of PAS domains. 
PAS motifs appear in archaea, eubacteria and eukarya. Probably 
the most surprising identification of a PAS domain was that in 
EAG-like K+-channels[1,3]. 
Number of members: 308 
[1] 

Medline: 97446881 

PAS domain S-boxes in archaea, bacteria and sensors for oxygen and redox. 
Zhulin IB, Taylor BL, Dixon R; 
Trends Biochem Sci 1997;22:331-333. 
[2]Medline: 95275818 

1.4 A structure of photoactive yellow protein, a cytosolic photoreceptor unusual fold, active site, and chromophore. 
Borgstahl GE, Williams DR, Getzoff ED; 
Biochemistry 1995;34:6278-6287. 
[3]Medline: 98044337 
PAS. a multifunctional ctomain family comes to light. 
Ponting CP, Aravind L; 
Curr Biol 1997;7:674-677. 
[1113] 395. (PBP) Phosphatidylethanolamine-binding protein family signature 

Mammalian phosphatidylethanolamine-binding protein (also knowns as basic cytosolic 21 Kd protein) is a 186 residue 
protein found in a variety of tissues (1 ]. It binds hydrophobic ligands, such as phosphatidylethanolamine. but also seems 
{2] to bind nucleotides such as GTP and FMN, it is suggested that it could act in membrane remodeling during growth 
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and maturatton. This protein belc»igs to a fanrjily that also Includes: 

• Drosophlla antenna! protein A5, a putative odorant-binding protein 

- Onchocerca volvulus antigen Ov-1 6 and the related proteins Dl D2 and D3 

- Plasmodium falciparum putative phosphatidylethanolamine-binding protein ' 

- Caenorhabditis elegans hypothetical prx^tein F40A3.3. 

Z:'rTLT;::i:^s. ^"""'^^ ^^•^'^^^^ -^^^ ---^ °' third or the 

[11141 Consensus pattern: [FYL]-x-[LV]-[LIVF]-x-(TIV]-[DC]-p.D-x-P-[SN]-x(10)-H 

[ 2] Schoentgen R. JoHes R FEBS Lett. 369:22-6(1995). 
[1115] 396. PCI domain 

This domain has also been called the PINT motif (Proteasome 
lnt-6. Nip-1 andTRIP-l5)[l]. 
Number of members: 49 
[1] 

Medline: 98308842 

The PCI domain: a common theme in three multiprotein complexes 
Hofmann K, Bucher P; 

Trends Biochem Sci 1 998;23: 204-205. 

[2]MedNne: 98266368 

Protein Sci 1998;7:1250-1254. 

2 M,"^!"'"'o M pSAJ-°-Q-''<2)-G-(FYWV]-x(3)-[AS]-P-[FYHDN]-x-l : 
[1118] ( 1] Kagan R.M., McFadden H.J.. McFadden PN O'Connor C Clarka c; rnm« o k r,u • 
379-385(1997). w^nnor u., Clarke S. Comp. Biochem. Physiol. 117b: 

[1119] 398. (PCNA) Proliferating cell nuclear antigen signatures 

wrth the Viral encoded DNA po^me.se. An homolog of f^CNA is aZ ouSrar^h^^^^^^ '^''^ '"""'"^ 
^^^Z -~h: „ section. The 

- Consensus pattern: IRKA]-C-[DEHRHI-x(3)-[UVMF]-x(3HLIVM]-x-[SGAN]-[LIVMF]-x-K-ILIVMFJ(2) 
! Jl T^'l^: ^ - McDonald-Bravo H. Nature 326:515-517(1987) 

[ 2] Suzuka I.. Hata S.. Matsuoka M.. Kosugi S.. Hashimoto J. Eur. J. Bkx.hem.^S,. 575(1991,.! 3] Bauer G. 
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A.. Burgess P.M.J. Nucleic Acids Res. 18:261-265(1990). 

[ 4] O'Reilly D R.. Crawford AM., Miller LK. Nature 337:606-606(1989). 

[11211 399. (PDT) Prephenate dehydratase signatures 

Prephenate dehydratase (EC 4.2.1.51) (PDT) catalyzes the decarboxylation of prephenate into phenylpyruvate In 
microorganisms PDT is involved in the temiinal pathway of the biosynthesis of phenylalanine. In some bacteria such 
as Escherichia coli PDT is part of a bif unctional enzyme (P-protein) that also catalyzes the transformation of chorismate 
into prephenate (chorismate mutase) while in other bacteria it is a monofunctional enzyme. The sequence of mono- 
functional PDT align well with the C-terminal part of that of P-proteins (1]. 

As signature pattems for PDT two conserved regions were selected. The first region contains a consented threonine 
which has been said to be essential for the activity of the enzyme in E. coli. The second region includes a consented 
glutamate. Both regions are in the C-terminal part of PDT 

[11 22] Consensus pattem: [FY]-x-ILI VM]-x(2)-[LIVMhx(5)-[DN]-x(5)-T-R-F-(LIVMW)-x-[LI VM] 
[1123] [ 1] Fischer R.S., ZhaoG., Jensen R.A. J. Gen. Mcrobiol. 137:1293-1301(1991) 
[1124] 400. PDZ domain (Also known as DHR or GLGF). 
[1 1 25] PDZ domains are found in diverse signaling proteins. 
[11 26] [1] Ponting CP, Phillips C, Davies KE, Blake DJ 

Bioessays 1997;19:469-479. [2] Doyle DA Lee A Lewis J. Kim E, Sheng M, MacKinnon R; Cell. 1996 85 1067-1076 

[3] Ponting CP; Protein Sci 1997;6:464-468. 

[1127] 401. (PPDK_N_temi) PEP-utilizing enzymes signatures 

A number of enzymes that catalyze the transfer of a phosphoryl group from phosphoenolpyruvate (PEP) via a phospho- 
histidine intemiedate have been shown to be structurally related [1 ,2,3,4]. These enzymes are: 

- Pyruvate,orthophosphate dikinase (EC 2.7.9. 1 ) (PPDK). PPDK catalyzes the reversible phosphorylation of pyru- 
vate and phosphate by ATP to PEP and diphosphate. In plants PPDK function in the direction of the formation of 
PEP, which IS the pnmary acceptor of carbon dtoxide in C4 and crassulacean acid metabolism plants In some 
bacteria, such as Bacteroides symbiosus, PPDK functions in the direction of ATP synthesis 

- Phosphoenolpyruvate synthase (EC 2.7.9.2) (pyruvate,water dikinase). This enzyme catalyzes the reversible 
phosphorylation of pyruvate by ATP to form PER AMP and phosphate, an essential step in gluconeogenesis when 
pyruvate and lactate are used as a carbon source. 

- Phosphoenolpyruvate-protein phosphotransferase (EC 2.7.39). This is the first enzyme of the phosphoenolpyru- 
vate-dependent sugar phosphotransferase system (PTS), a major carbohydrate transport system In bacteria The 
PTS catalyzes the phosphorylation of incoming sugar substrates concomitant with their translocation across the 
cell membrane. The general mechanism of the PTS is the following: a phosphoryl group from PEP is transferred 
to enzyme-l (El) of PTS which in turn transfers it to a phosphoryl carrier protein (HPr). Phospho-HPr then transfers 
the phosphoryl group to a sugar-specific permease. 

All these enzymes share the same catalytic mechanism: they bind PEP and transfer the phosphoryl group from it to a 
histidine residue. The sequence around that residue is highly consen/ed and can be used as a signature pattem for 
these enzymes. As a second signature pattern a consented region was selected in the C-terminal part of the PEP- 
utihzing enzymes. The biological significance of this region is not yet known. 

[1 128] Consensus pattem: G-[GA]-x-[TNJ-x-H-[STA]-[STAV]-ILIVM](2)-(STAV]-[RG] [H Is phosphorylatedj 

• j^^|®"j2)^p P^"®'"' f°^QS'^-''-fL'VMF]-S-[LIVMF]-G-[ST]-N-D-[LIVM]-x-Q-[LIVMFYGT]-[STALIV]-[LIVMFJ- 

[ 1] Reizer J., Hoischen C, Reizer A, Pham TN., Saier M.H. Jr. Protein Sci. 2:506-521(1993) 
[2] Reizer J.. Reizer A. Merrick M.J., Plunkett G. III. Rose D.J., Saier M.H. Jr. Gene 181:103-108(1996) 
(1990)^'^'^° ° ' ^'^ ' "^^^'^ ^^^^'^ " °""^*^y-'^3^'3"° Biochemistry 29:10757-10765 

[ 4] Niersbach M.. Kreuzaler F., Geerse R.H., Postma R, Hirsch H.J. Mol. Gen. Genet. 232:332-336(1992). 
[1129] 402. (PEPCK ATP) Phosphoenolpyruvate carboxykinase (ATP) signature 

Phosphoenolpyruvate carboxykinase (ATP) (EC 4. 1 . 1 .49) (PEPCK) [1] catalyzes the formatkjn of phosphoenolpyruvate 
by decartjoxylation of oxaloacetate while hydrolyzing ATR a rate limiting step in gluconeogenesis (the biosynthesis of 
glucose). 

The sequence of this enzyme has been obtained from Escherichia coli. yeast, and Tiypanosoma brucei- these three 
sequences are evolutionaiy related and share many regions of similarity. As a signature pattern a highly conserved 
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region was selected that contains four acidic residues and which is located in the central pari of the enzyme The 

o?!!!CI.?lf ^^^^"^ ^'^"^ ^° '^^'^"^ '° C-terminus of an ATP-binding motif W (P-l«!op)' (see 

<PDOC0001 7>) and is also part of the ATP-binding domain [2]. ' " ^ 

[1130] Consensus pattern: L-l-G-D-D-E-H-x-W-x-[DE]-x-G-[l\/]-x-N 

" u^n^ P'l^P^°°"°'Py'^f carboxykinase (GTP) (EC 4.1 .1 .32) an enzyme that catalyzes the same reaction, but 
using GTP instead of ATP, is not related to the above enzyme (see <PDOC00421>). 

[ 1] Medina V., Pontarollo R. Glaeske D.. Tabel K. Goldie H. J. Bacteriol. 172 7151-7156(1990) 
( 2] Matte A. . Goldie H. , Sweet R. M. . Delbaere L.T J. J. Mol. Biol. 256: 1 26-1 43(1 996). 

Phosphoenolpyruvate carboxylase active sites. Phosphoenolpyruvate carboxylase (EC 

^iSil f ? '^^If f irreversible beta-carboxylation of phosphoenolpyruvate by bicarbonate to yield 

a^SS hL" H ' ' r " P'""^^ ^ ^^"^'y °' 'Microorganisms. A histidine [1] and 

H HK " T'' '^^'^'^ "'"'^'^"'"'^ °* ^"^'"^^ ^^g'^"^ these actilJ site 

residues are highly consented in PEPcase from various plants, bacteria and cyanobacteria and can be used as a 
signature patterns for this type of enzyme. 

[11 32] Consensus pattern: [VT]-x-T-A-H-P-T-[EQ]-x(2)-R-[KRH] [H is an active site residuel- 
Consensus pattern: [IV]-M-[LIVM]-G-Y-S-D-S-x-K-D-[STAG)-G [K Is an active site residuej- 

G H AniriJr? r'^ 'k'"' ^k' '^^ 202:797-803(1 991 ).[ 2] Jiao J.-A.. Podesta F.E.. Chollet R., O'Leary 
M.H., Andreo C.S. Biochim. Biophys. Acta 1041:291-295(1990) 
[1134] 404. PET1 12 family signature 

The following proteins from eukaryotes. prokaryotes and archaebacteria belong to the same family: 

■ l^la^Tyr?!^'" ^™ f'^' "^^^ P'^^" ^" ""'^"'^ expression of mitochondrial genes. 

probably at the level of translation. 

- Aspergillus nidulans mitochondrial protein nempA. 

- Bacillus subtiHs hypothetical protein yzdD. 

- Moraxella catarrhalis hypothetical protein in bloR-1 3'region. 

- Mycoplasnna genrtallum hypothetical protein MG100. 

- Methanococcus jannaschii hypothetical proteins MJ001 9 and MJ0160. 

The size of these proteins range from 41 9 to 630 amino acids. As a signature pattem. a conserved region located in 
the N-tenninal section was selected. m 
[1135] Consensus pattem: [DN]-x-[DN]-R-x(3)-P-L-[UV]-E-[LIV]-x-[ST]-x-P 
[1136] [ 1] Mulero JJ.. Rosenthal J.K., Fox TD. Curr Genet. 25:299-304(1994) 
[1 1 37] 405. (PFK) Phosphofructokinase signature 

Phosphofructokinase (EC 2.7.1 .11) (PFK) [1.2] is a key regulatory enzyme in the glycolytic pathway It catalyzes the 

IS. IS J" TfLTt T ^ homologous domains 

which are highly related to the bacteral 36 Kd subunits. In Human there are three, tissue-specific, types of PFK iso- 
zymes: PFKM (muscle) PFKL (Ih^er). and PFKP (platelet). In yeast PFK is an octamer composed oi foSr 00 Kd atp^^ 
Chains (gene PFK1 ) and four 100 Kd beta chains (gene PFK2); like the mammalian 80 Kd subunrts. the yeast 100 Kd 
subunits are composed of two homotogous domains. 

iasVJfiter ^^"^"^ ^'''^ ^ ""^^^^ "^^''^ '"^""^^ fructose-6-phosphate binding 

f^e Jp Sng]'"' "^"^'"^ [R'<]-x(4)-G-H.x-Q-[QR]-G-G-x(5)-D.R [The R/K. the H and the Q/R are involved in fruc- 

' Esdierichia coli has two phosphofructokinase Isozymes which are encoded by genes pfkA (major) and pfkB 

(minor). The pf kB isozyme is not evolutionary related toother prokaiyotic or eukaiyotic PFK's (see <PDOC00504>). 

[ 1] Poomian R.A.. Randolph A.. Kemp R.G.. Heinrikson R.L. Nature 309-467-469(1984) 

( 2] Heinisch J.. Ritzel R.G.. von Borstel R.C., Aguilera A.. Rodicio R.. Zimmermann F.K. Gene 78:309-321(1989). 

[11 39] 406. (PGAM) Phosphoglycerate mutase family phosphohistidine signature 

related enzymes which catalyze reactions involving the transfer of phospho groups between the three cart»n alSns 
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of phosphoglycerate [1.2]. Both enzymes can catalyze three different reactbns. although in different proportbns: 

■ i^lnPG^Sli!" °' 2-phc«phoglycerate (2-PGA) to 3i,hosphoglycerate {3-PGA) with 2.3^iphosphoglycerate 
(2,3-DPG) as the primer of the reaction. f "ax 

- The synthesis of 2.3-DPG from 1 ,3-DPG with 3-PGA as a primer 

- The degradation of 2,3-DPG to 3-PGA (phosphatase EC 3.1.313 activity). 

lirr^M " ^ "^"'^ ^° °' P«AM: the M (muscle) and B (brain) fomis In 

yeast. PGAM is a tetramenc protein. BPGM is a dimeric protein and is found mainly in eiythrocytes where ft Zs a 
major role in regulating hemoglobin oxygen affinfty as a consequence of controlling 2.3-DPG concentS 

^ e Sun^tLT"" ? T ^''^'^ °' ^ Pf^o^Phohistidine intermediate (3J 

J^F2K mS^^ ' fructose-2.6-bisphosphatase (EC 2.7.1.105 and EC 3 l i.46) 

(PF2K) (4] catalyzes both the synthesis and the degradation of fructose-2.6-bisphosphate. PF2K is an important en 
zyme .n the regulation of hepatic carbohydrate metabolism. Like PGAM/BPGM. the Ltose-i-t^Lrhile re- 
action involves a phosphohistidine intermediate and the phosphatase domain of PF2K is structuralV reZ to PGA,^ 

tlTe" rc^rto t^ f™''"''"''" "''^ ^^'^ '"^^'^^'^ 

A signature pattern was built around the phosphohistidine residue 

[1140] Consensus pattern: ILIVM]-x-R-H-G-[EQ]-x(3)-N [H is the phosphohistidine residue] 

■ 6ZZlto^tr ' '^''^ -'n^epen^eni of 2.3-DPG. this enzyme is not related to the fami^ 

[^1] Le Boulch R, Joulin V.. Garel M.-C. Rosa J.. Cohen-Sola! M. Biochem. Biophys. Res. Commun. 156:874-881 

[ 2] Whfte M.F., Fothergill-Gilmore LA. FEBS Lett. 229:383-387(1988). 
[ 3] Rose Z.B. Meth. Enzymol. 87:43-51 (1 982). 

[ 4] Bazan J.F., Fletterick R.J., Pilkis S.J. Proc. Natl. Acad. Sci. U.S.A 86-9642-9646(1989) 

! 2 ^ A ■ Trzebiatowski J.R., Escalante-Semerena J.C. J. Biol. Chem. 269:26503-26511(1994) 

[6] Grana X . De Lecea L.. El-Maghrabi M.R.. Urena J.M.. Caellas C. Carreras J.. Puigdomenech P Pilkis S J 

Climent F. J. Biol. Chem. 267:12797-12803(1992). auumenecn r., niKis &.J., 

[1141] 407. (PGI) Phosphoglucose isomenase signatures 

Phosphc^lucose isomerase (EC 5.3. 1 .9) (PGI) (1 .2] is a dimeric enzyme that catalyzes the reversible isomerization of 
?nvo?^ "Ti fructose-6-phosphate. PGI is involved in dWerent pathways: in most hig e o^anTm ^ 1 

invo^^ n g^colysis; ,n mammals ft is invoK^ed in gluconeogenesis; in ptents in carbohydrate biosynthesis- tn Lme 
bacter« ft provides a gateway for fructose into the Entner-Doudouroff pathway PGI has been shown [3 to be ideS 
to neuroleukin. a neurotrophic factor which supports the sun^ival of various types of neurons 
hr„hi!T.^"''^ H ^P^'^'^^ '^^'"9 from bacteria to mammals is available and has been shown to be 

highly conserved. Assignature patterns forlhisenzymetwoconsen/ed regions were selected, the first regioni^^^ 
m the central section of PGI. while the second one is located in fts C-terminal section ^ 

[1142] Consensus pattern: (DENS]-x-[LIVM]-G-G-R-(FY]-S-[LIVMT].x-(STA]-[PSAC]-[LIVMA]-G 
- Consensus pattern: (GS]-x-[LIVM]-(LIVMFYW]-x(4)-[FY]-(DN]-Q-x-G-V-E-x(2)-K 

&l.^9ai'4t'l57'T98l).^'^^ ^ ^ ^ Lond.. B. Biol. 

f 2] Smfth M.W., Doolittle R.F. J. Mol. Evol. 34:544-545(1992). 

[ 3) Faik R, Walker J.I.H.. Redmill A.A.M.. Morgan M.J. Nature 332:455-456(1988). 

[1143] 408. (PGK) Phosphoglycerate kinase signature 

llSih^^'^^"'"'' f2 ^^"^^^ ^^'y""" ^•'^ «'«P s^ond phase of glycolysis the 

L Zd fn'Sr'""" ' :=^'P^°«P^'°9Vcerate to 3-phosphoglycerate wfth generation of one molecule of A^P PGK 
IS found in all living organisms and its sequence has been highly consen/ed throughout evolutton It is a two<tomain 
protem; each domain is composed of sbc repeats of an alpha/beta structural motif. Is a signature partem f^^K^ a 
conserved region in the N-temiinal region was selected. 
Consensus pattern: [KRHGTCVN]-[VT]-[UVMFl-[UVMC]-R-x-D-x-N-(SACV]-P 
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[1144] [ 1] WatsOT H.C.. Littlechild J.A. Biochem. Soc. Trans. 18:187-190(1990). 

[114S] 409. (PGM PMM) Phosphoglucomutase and phosphomannomutase phosphoserine signature 

- Phosphoglucomutase (EC 5.4.2.2) (PGM). PGM is an enzyme responsible for the conversion of D-glucose 1 -phos- 
phate into D-glucose 6-phosphate. PGM participates in both the breakdown and synthesis of glucose fl 1 

- Phosphonannomutase (EC 5.4.2.8) (PMM). PMM is an enzyme responsible tor the conversion of D-mannose 
1 -phosphate into D-mannose 6-phosphate. PMM Is required for different biosynthetic pathways in bacteria For 
ejarnple. in enterobacteria such as Escherichiacoli there are two different genes coding for this enzyme- rfbK 
wrtiichisinvolvedinthesynthesisoftheOantigenoflipopolysaccharideandcpsGwhichisreq^ 

of the M antigen capsular polysaccharide [2]. In Pseudomonas aeruginosa PMM (gene algC) is involved in the 
biosynthesis of the alginate layer [3] and in Xanthomonas campestris (gene xanA) it is involved in the biosynthesis 
of xantfian [4]. In Rhizobium strain ngr234 (gene noeK) it is involved in the biosynthesis of the nod factor 

- Phosphoacetylglucosamine mutase (EC 5.4.2.3) which converts N-acetyl-D-glucosamine 1-phosphate into the 
D-pnosphate isomer 

The catalytic mechanism of both PGM and PMM involves the formation of a phosphoserine intermediate [1] The 

sequence around the serine residue is well consented and can be used as a signature pattern 

In addition to PGM and PMM there are at least three uncharacterized proteins that belong to this family [5,6]: 

- Urease operon protein ureC from Helicobacter pylori. 
Escherichia coli protein mrsA. 

- Paramecium tetraurelia parafusin, a phosphoglycoprotein involved in exocytosis 

- A Methanococcus vannielii hypothetical protein in the 3'region of the gene for ribosomal protein S10. 

[1146] Consensus pattem: [GSA]-{LIVM].x-[LIVMHST]-[PGA].S-H-x.P-x(4HGNHE] [S is the phosphoserine resi- 

- Note: PMM from fungi do not belong to this family 

[ 1] Dai J.B., Liu Y. Ray W.J. Jr., Konno M. J. Biol. Chem. 267:6322-6337(1992). 

[ 2] Stevenson G.. Lee S.J.. Romana LK., Reeves RR. Mol. Gen. Genet. 227:173-180(1991) 

[ 3] Zielinski N.A.. Chakrabarty A.M.. Berry A. J. Biol. Chem. 266:9754-9763(1991) 

[ 4] Koeplin R.. Arnold W., Hoette B., Simon R., Wang G., Puehler A. J. Bacteriol. 174-191-199(1992) 

[ 5] Bairoch A. Unpublished observations (1993). 

[ 6] Subramanian S.V.. Wyroba E.. Andersen A.R. Satir B.H. Proc. Natl. Acad. Sci U.S.A. 91:9832-9836(1994). 
[1147] 410. PH domain profile 

IvniJintfr ^""Tf"^- ^^r^ ^ °^ ^^^'^ ^^"'^"^^ ^ w'de range of proteins 

involved in intracellular signaling or as constituents of the cytoskeleton [1 to 7] » k 

11^ ^TT^ f ^^!^- "^"^1" """^ ^^""^"^^ ^"^^^^^ ^^^^ suggested: - binding to the beta/gamma 

subunrt of heterotrimenc G proteins, 

- binding to lipids, e.g. phosphatidylinositol-4,5-bisphosphate, 
binding to phosphorylated Ser/Thr residues, 
attachment to membranes by an unknown mechanism. 

It is possible that different PH domains have totally different ligand requirements 
The 3D structure of several PH domains has been detemriined [8]. All 

tTr'^r^^ antii^arallel beta sheets, followed by a C-temiinal amphipathic helix. The loops connecting the 
^esTduTi^i^^^^^ "^'"^ '^^^^ '^^'^'^ ^^'^"'^ - - totally invariant 

Proteins reported to contain one more PH domains belong to the following families: 

• Pleckstrin. the protein where this domain was first detected, is the major substrate of protein kinase C in platelets 
Hieckstrin IS one of the rare proteins to contains two PH domains. 

- Ser/Thr protein kinases such as the Act/Rac family, the beta-adrenergic receptor kinases, the mu isoform of PKC 

and the trypanosonnal NrkA family. 

• Tyrosine protein kinases belonging to the Btk/ltKHec subfamily. 
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- Insulin Receptor Substrate 1 (IRS-1). 

- Regulators of small G-proteins like guanine nucleotide releasing factor GNRP (Ras-GRF) (which contains 2 PH 
liKe rasOiAP and BEM2/IPL2. and the human break point cluster protein bcr 

■ !Xe°tpS?cStt?soSr'K CaenorhalxlHis elegans kinesin-like protein unc-104 

M«mmlH?n T tl!*^. syntrophin (2 PH domains) and yeast nuclear migration protein NUM1 

- Mamrr^ahan phosphat,dyl,nositol-specific phospholipase C (PI-PLC) (see <PDCX;50007>)Lorms gaLa a^^^ 
delta. Isoform gamma contains two PH domains, the second one is split into two parts sepaS^ bS a3 So 
residues. - Oxysterol binding proteins OSBP, yeast OSH1 and YHR073w. separated by about 400 

- Mouse protein citron, a putative rt,o/rac effector that binds to the GTP-bound forms of rho and rac 

" i ^! ^ '^"^^ '^9"'^"°" ''"d formatbn like BEM2. BEM3 BUD4 and the 

cLnoS T'" ^'^ ■ C««"0^^«bditis elegans protei^ M,G-m 

- caenorhabdrtis elegans hypothetical proteins C04D8. 1 . K06H7 4 and ZK632 1 2 

- Yeast hypothetical proteins YBR1 29c and YHR155W. 

The profile tor the PH domain, which has been developed by Toby Gibson at the EMBL. covers the total lenoth of 
domain^ Several proteins contain large insertions in the PH domain and are thus difficult to deteS with pSe t 
some of these cases, the profile will align only to one half of the PH domain. 

- Sequences known to belong to this class detected by the pattern: ALL But it should be noted that while all se 

^" ^^"'^'"^ °' s;it"^!J:rb:.ow the 

[ 1] Mayer B.J., Ren R, Clark K.L, Baltimore D. Cell 73:629-630(1993). 
[ 2] Haslam R.J., Koide H.B., Hemmings B.A. Nature 363*309-310(1993) 

[ 3] Musacchio A Gibson TJ.. Rice P.. Thompson J.. Saraste M. Trends Biochem. Sci. 18:343-348(1993) 
[4] Gibson TJ., Hyvonen M.. Musacchio A.. Saraste M. Bimey E, Trends Biochem Sci 19 34^53(1^^^^ f 51 
Pawson T Nature 373:573-580(1 995). [ 6] Ingley E., Hemmings B.A. J. Cell. Biochem.' sl^^i ^3(S 
a^e M.. Hyvonen M. Curr Opin. Struct. Biol. 5:403-408(1 995). [8] Riddihough G. Nat. Struct Bi^^^^^^^^^^^^ 

411.PHD-finger 
[1] 

Medline: 95216093 

The PHD finger: implications for chromatin-mediated transcriptional regulation 
Aasland R, Gibson TJ. Stewart AF; 

Trends Biochem Sci 1 995;20:56-59. 
Number of members: 181 

Sse iVc s'^i'?;'!! ^*'°^P^^'''^y"".°«»°'-sP««=ffi<= phospholipase C profiles PhosphatkJyIinositol-specific phos- 
th K H I' , .nlracellular enzyme, plays an important role in signaltransduction processes 

nil T^T . °' ' ■P'^'«Pf'««'ly'-D-'"y°-inositol-3.4,5-triphosphate into the second mes™ m^^^^^^^ 

jrSiSots:^-^^^^^^^^^^^^ 

All eukaryotic PI-PLCs contain two regions of homology, sometimes referred to as 'X-box" and T-box- The order of 

possibly involved in Ca-dependent membrane attachment <f'Liuu}0380>) 
Profile anaVsis shows that sequences with significant similarity to the X-box domain occur also in orokan/otic «„h 
Xrc^u^^at^'^^^^"^^^^ 

Two profiles were developed, one covering the X-box. the other the Y-box. 

[1] Meldrum E.. Parker PJ.. Carozzi A. Bkx^im. Biophys. Acta 1092-49-71(199ll f 21 Rhea <? a rhm k- n a* 
Second Messenger Phosphoprotein Res. 26:35-61(1992). n[W9^U2] Rhee S.G.. Choi K.D. Adv. 
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[ 3] Rhee S.G., Choi K.D. J. Biol. Chem. 267:12393-12396(1992). 

[ 4] Stemweis P.C.. Smrcka A.V. Trends Blochem. Sci. 17:502-506(1992). 

[11 49] 41 3. (PI-PLC-Y) Phosphatidyiinositol-specific phospholipase C profiles 

Pliosphatidylinositol-specific phospholipase C (EC 3.1.4.11). an eukaryotic intracellular enzyme, plays an important 
role.ns(gnaltransductionprocesses[1]. it catalyzes thehydrolysis of 1-phosphatidyl-D-my^^^^^ 
into he second messenger molecules diacylglycerol and inositol-1.4,5-triphosphate. This catalytic process is tiqhtlv 
regulated by reversible phosphorylation and binding of regulatory proteins [2 to 4] 

in mammals there are at least 6 different isofomis of PI-PLC. they differ in their domain structure, their regulation and 
their tissue distribution. Lower eukaiyotes also possess multiple isoforms of PI-PLC. 

All eukaryotic PI-PLCs contain two regions of homology, sometimes referred to as 'X-box* and "Y-box' The order of 
these two regions is always the same (NH2-X-Y-COOH). but the spacing is variable. In most isoforms. the distance 
between these two regions is only 50-1 00 residues but in the gamma isoforms one PH domain, two SH2 domains and 
one SH3 domain are inserted between the two PLC-specific domains. The two conserved regions have been shown 
to be important for the catalytic activity. At the C-terminal of the Y-box. there is a C2 domain (see <PDOC00380>) 
possibly involved in Ca-dependent membrane attachment. 

Profile analysis shows that sequences with significant similarity to the X-box domain occur also in prokaryotic and 
trypanosome Pl-specific phospholipases C. Apart from this region, the prokaryotic enzymes show no similarity to their 
eukaryotic counterparts. 

Two profiles were developed, one covering the X-box. the other the Y-box. 

[1] Meldrum E., Parker PJ.. Carozzi A. Biochim. Biophys. Acta 1092:49-71(1991 ).[ 2] Rhee S G Choi K D Adv. 

Second Messenger Phosphoprotein Res. 26:35-61(1992). 

[ 3] Rhee S.G.. Choi K.D. J. Biol. Chem. 267:12393-12396(1992). 

[ 4] Sternweis PC, Smrcka A.V. Trends Biochem. Sci. 17:502-506(1992). 

[1 1 SO] 41 4. (PK) Pyruvate kinase active site signature 

Pyruvate kinase (EC 2 7.1.40) (PK) [1] catalyzes the final step in glycolysis, the conversion of phosphoenolpyruvate 
to pyruvate with the concomitant phosphorylation of ADP to ATP PK requires both magnesium and potassium ions for 
Its aclMty. PK is found in all living organisms. In vertebrates there are four, tissues specific, isozymes: L (liver) R (red 
cells). Ml (muscle, heart, and brain), and M2 (early fetal tissues). In Escherichia coli there are two isozymes- PKA 
(gene pykF) and PK-II (gene pykA). All PK isozymes seem to be tetramers of identical subunits of about 500 amino 
acid residues. 

As a signature pattern for PK a consen/ed region was selected that includes a lysine residue which seems to be the 
acid^ase cata^st responsible for the interconversion of pyruvate and enolpyruvate. and a glutamic acid residue im- 
plicated in the binding of the magnesium ion. 

[1151] Consensus pattern: [LIVAC]-x-fLIVM](2)-[SAPCV]-K-[LIV]-E-[NKRSTI-x-[DEQHS]-[GSTA]-[LIVM] [K Is the 

active site residue] [E is a magnesium ligand] 

[1152] [1] Muirhead H. Bkx:hem. Soc. Trans. 18:193-196(1990). 

[1153] 41 5. (PLDc) Phospholipase D. Active site motif 

PhMphatidylcholine-hydrolyzing phospholipase D (PLD) isofomis are activated by ADP-ribosylation factors (ARFs) 
PLD produces phosphatidic acid from phosphatidylcholine, which may be essential for the formation of certain types 
Of transport vesicles or may be constitutive vesKular transport to signal transduction pathways 
PC-hydrolyzing PLD is a homobgue of cardiolipin synthase, phosphatidylserine synthase, bacterial PLDs and viral 

proteins. ' 

Each of these appears to possess a domain duplication which is apparent by the presence of two motifs containing 
well^onsen^ed histidine. lysine, and/or asparagine residues which may contribute to the active site, aspartic acid An 
E coll endonuclease (nuc) and similar proteins appear to be PLD homologues but possess only one of these motifs 
The profile contained here represents only the putative active site regions, since an accurate multiple alignment of the 
repeat units has not been achieved. r- » 

Number of members: 1 39 
[11 

Medline; 96303814 

A novel family of phospholipase D homologues that includes phospholipid synthases and putative endonucleases- 
Identification of duplicated repeats and potential active site residues 
Ponting CP, Ken- ID; 

Protein Sci 1996;5:914-922. 

[2]h/Iedline: 96334293 
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A duplicated catalytic motif in a new superfamiiy of phosphohydrolases and phospholipid synthases that includes pox- 
vlais envelope proteins. 
Koonin EV; 

Trends Biochem Sci 1996;21:242-243. 

[SJMedline: 94327597 

Cloning and expression of phosphatidylcholine-hydrolyzing phospholipase D from Ricinus communis L 
Wang X, Xu U Zheng L; 
J Biol Chem 1994;269:20312-20317, 
[4]Medline: 97386825 

Regulation of eukaryotic phosphatidyllnosrtol -specific phospholipase C and phospholipase D 
Singer WD, Brown HA. Sternweis PC; 
Annu Rev Biochem 1997;66:475-509. 
[1154] 416. (PMI typel ) Phosphomannose isomerase type I signatures 

Phosphomannose Isomerase (EC 5.3.1.8) (PMI) [1,2] is the enzyme that catalyzes the Interconversion of mannose- 
6-phosphate and fructose-6-phosphate. In eukaryotes, it Is involved in the synthesis of GDP-mannose which is a con- 
stituent of N- and O-linked glycans as well as GPI anchors. In prokaryotes. it is involved In a variety of pathways 
including capsular polysaccharide biosynthesis and D-mannose metabolism. 

Three classes of PMI have been defined on the basis of sequence similarities [1]. The first class comprises all known 
eukaryotic PMI as well as the enzyme encoded by the manA gene In enterobacteria such as Escherichia coll. Class I 
PMI's are proteins of about 42 to 50 Kd which bind a zinc ion essential for their activity. 

As signature pattems for class I PMI. two consented regions were selected. The first one is located in the N4erminal 
section of these proteins, the second in the C-temiinal half. Both pattems contain a residue involved [3] in the bindinq 
of the zinc ion. ^ 
[1155] Consensus pattern: Y-x-D-x-N-H-K-P-E [E is a zinc ligand] 

- Consensus pattem: H.A-Y-[LI VM]-x-G-x(2)-[LI VM].E-x-M-A.x-S-D-N.x-[LIVM]-R-A.G-x-T-P.K [H is a zinc ligand] 

[ 1] Proudfoot A.E.I.. Turcatti G.. Wells TN.C. Payton M.A., Smith D.J, Eur J. Biochem. 219:415-423(1994) 
[ 2] Coulin P.. Magnenat E„ Proudfoot A,E.I., Payton M.A„ Scully P, Wells TN.C. Biochemistry 32-14139-14144 
(1993). 

[ 3] Cleasby A., Wonacott A., Skarzynski T. Hubbard R.E., Davies G.J.. Proudfoot A.E.I. . Bernard A R Payton 
M.A., Wells TN.C. Nat. Struct. Biol. 3:470-479(1996). 

[1156] 417. (PNP UDP 1) Purine and other phosphorylases family 1 signature 
The following phosphorylases belongs to the same family: 

- Purine nucleoside phosphorylase (EC 2.4.2. 1 ) (PNP) from most bacteria (gene deoD). This enzyme catalyzes the 
cleavage of guanoslne or inosine to respective bases and sugar-1 -phosphate molecules [1] 

- Uridine phosphorylase (EC 2.4.2.3) (UdRPase) from bacteria (gene udp) and mammals. Catalyzes the cleavage 
of undine into uracil and ribose-l -phosphate. The products of the reaction are used either as carbon and energy 
sources or in the rescue of pyrimidine bases for nucleotide synthesis [2]. 

- 5'-methylthioadenosine phosphorylase (EC 2.4.2.28) (MTA phosphorylase) from Sulfolobus solfataricus [3]. 

As a signature pattern, a conserved region was selected in the central part of these enzymes. 
[1157] Consensus pattem: [GST]-x-G-[LIVM]-G-x-[PA]-S-x-[GSTA]-i-x(3)-E-L 

- Note: It shoudi be noted that mammalian and some bacterial PNP as well as eukaryotic MTA phosphorylase belong 
to a different family of phosphorylases (see <PDOC00954>). 

[1] Takehara M., Ling P, Izawa S,. Inoue Y, Kimura A. Biosci. Blotechnol. Biochem. 59:1987-1990(1995) 

[ 2] Watanabe S.-l., Hino A.. Wada K.. Eliason J.R, Uchlda T J. Biol. Chem, 270:12191-12196(1995). 

[ 3] Cacciapuoti G.. Porcelli M.. Bertoldo C. De Rosa M.. Zappia V J. Biol. Chem. 269:24762-24769(1994). 

[1158] 418. (PP2C) Protein pho^hatase 2C signature 

Protein phosphatase 2C (PP2C) is one of the four major classes of mammalian serineAhreonine specific protein phos- 
phatases (EC 3.1 .3,16). PP2C [1] is a rronomeric enzyme of about 42 Kd which shows broad substrate specificity and 
IS dependent on divalent cations (mainly manganese and magnesium) for its activity. Its exact physiological role Is still 
unclear. Three isozymes are current^ known in mamn^Is: PP2C-alpha. -beta and ^amma. In yeast, there are at least 
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four PP2C homologs: phcephatase PTC1 [2] which has weak tyrosine phosphatase activity in addition to its activity 
on serines, phosphatases PTC2 and PTC3, and hypothetical protein YBR125c. Isozymes of PP2C are also known 
from Arabkiopsis thaliana (ABI1. PPH1), Caenorhabditis elegans (FEM-2. F42G9.1, T23F11.1). Leishmania chagasi 
and Paramecium tetraurelia. 

In Arabidopsis thaliana, the kinase associated protein phosphatase (KAPP) [3] is an enzyme that dephosphorylates 
the SerAlir receptor-like kinase RLK5 and which contains a C-termlnal PP2C domain. 

PP2C does not seem to be evolutionary related to the main family of serine/ threonine phosphatases: PP1 , PP2A and 
PP2B . However, it is significantly similar to the catalytic subunit of pyruvate dehydrogenase phosphatase (EC ai .3.43) 
(PDPC) [4], which catalyzes dephosphorylation and concomitant reactivation of the alpha subunit of the El component 
of the pyruvate dehydrogenase complex. PDPC Is a mitochondrial enzyme and, like PP2C. is magnesiumndependent. 
As a signature pattern, the best conserved region was selected which is located in the N-terminal part and contains a 
perfectly conserved tripeptide. This region includes a conserved aspartate residue involved in divalent cation binding 
[5). 

[1 1 59] Consensus pattern: [LI VMFY]-[LI VMFYA]-[GSAC]-[U VM]-[FYC]-D-G-H-[G AV] 

Note: PP2C belongs [6] to a superfamily which also includes bacterial proteins such as Bacillus spollE, rsbU and 
rsbW. Synechocystis PCC 6803 icfG as well as a domain in fungal adenylate cyclases. 

[ 1] Wenk J., Trompeter H.-L. Pettrich K.-G., Cohen RTW., Campbell D.G., MIeskes G. FEBS Lett. 297:135-138 
(1992). 

[ 2] Maeda T, Tsai A.Y.M.. Sarto H. Mol. Cell. Biol. 13:5408-5417(1993). 

[ 3] Stone J.M., Collinge M.A., Smith R.D.. Horn M.A., Walker J.C. Science 266:793-795(1994). 

[ 4] Lawson J.E.. Nlu X.-D.. Browning K.S.. Trong H.L. Yan J.. Reed L.J. Biochemistry 32:8987-8993(1993). 

[ 5] Das A.K., Helps N.R.. Cohen RTW.. Barford D. EMBO J. 24:6798-6809(1996). 

[ 6] Bork R, Brown N.R, Hegyi H., Schultz J. Protein Sci. 5:1421-1425(1996). 

[1160] 419. (PPTA) Protein prenyltransferases alpha subunit repeat signature 

Protein prenyltransferases catalyze the transfer of an isoprenyl moiety to a cysteine four residues from the C-tenminus 
of several proteins. They are heterodimeric enzymes consisting of alpha and beta subunits. The alpha subunit is thought 
to participate in a stable complex with the isoprenyl substrate; the beta subunit binds the peptide substrate. Distinct 
protein prenyltransferases might share a common alpha subunit Both the alpha and beta subunit show repetitive 
sequence motifs [1 ]. These repeats have distinct structural and functional implications and are unrelated to each other 
Known protein prenyltransferase alpha subunits are: 

Mammalian protein farnesyltransferase alpha subunit. 

Yeast protein RAM2, a protein farnesyltransferase alpha subunit. 

Yeast protein BET4, a protein geranylgeranyltransferase alpha subunit. 

The conserved domain of the alpha subunit consists of about 34 amino acids and is repeated five times. It contains 
an invariant tryptophan possibly involved in heterodlmerization with the conserved phenylalanines in the repeated 
domains of the beta subunits. via hydrophobic bonds. The signature pattern for this domain is centered on the invariant 
tryptophan. 

[1161] Consensus pattem: [PSIAV]-x-[NDFV]-(NEQIY]-x-[LIVMAGP]-W-(NQSTHF]-[FYHQ]-[LIVMR] 
[1162] [ 1] Boguski M.S.. Murray A.W.. Powers S. New BloL 4:408-411(1992). 
[1163] 420. (PR55) Protein phosphatase 2A regulatory subunit PR55 signatures 

Protein phosphatase 2A (PP2A) is a serine/threonine phosphatase involved in many aspects of cellular function in- 
cluding the regulatbn of metabolic enzymes and proteins involved in signal transduction. PP2A is a trimeric enzyme 
that consists of a core composed of a catalytic subunit associated with a 65 Kd regulatory subunit (PR65), also called 
subunit A; this complex then associates with a third variable subunit (subunit B). which confers distinct properties to 
the holoenzyme [1]. One of the forms of the variable subunit is a 55 Kd protein (PR55) which is highly conserved in 
mammals - where three isoforms are known to exist -, Drosophila and yeast (gene CDC55). This subunit could perform 
a substrate recognition function or be responsible for targeting the enzyme complex to the appropriate subcellular 
compartment. 

As signature patterns, two perfectly consen/ed sequences of 15 residues were selected; one located in the N4erminal 

region, the other in the center of the protein. 

[1164] Consensus pattem: E-F-D-Y-L-K-S-L-E-l-E-E-K-l-N 

Consensus pattern: N-(AG]-H-[TA]-Y-H-I-N-S-I-S-[LIVM]-N-S-D 

[1 1 65] [ 1 ] Mayer-Jaekel R.. Hemmings 8. A. Trends Cell Biol. 4:287-291 (1 994). 



169 



EP 1 033 405 A2 



70 



75 



20 



[1 1 66] 421 . N-(5'phosphof ibosyl)anthranilate (PRA) isomerase 
[1] Wilmanns M, Pfiestle JR Niermann T. Jansonius JN; 
J Mol Bbl 1 992:223:477-507. 
[1 1 67] 422. (PRK) Phosphoribulokinase signature 

Phosphoribulokinase (EC 2.7. 1.19) (PRK) [1 .2] is one of the enzymes specific to the Calvin's reductive pentose phos- 
phate cycle which is the major route by which carbon dioxide is assimilated and reduced by autotrophic organisms 
PRK catalyzes the ATP-dependent phosphorylation of ribulose 5-phosphate into ribulose 1,5-bisphosphate which is 
the substrate for RubisCO. PRK's of diverse origins show different properties with respect to the size of the protein 
the subunit structure, or the enzymatic regulation. However an alignment of the sequences of PRK from plants algae' 
photosynthetic and chemoautotrophic bacteria shows that there are a few regions of sequence similarity. As a signature 
pattern one of these regions was selected. 
[1 1 68] Consensus pattem: K-[LI VM]-x-R-D-x(3)-R-G-x-[ST]-x-E 

[ 1] Kossmann J., Klintworth R., Bowien B. Gene 85:247-252(1989). 
[ 2] Gibson J.L., Chen J.-H.. Tower PA., Tabita F.R. Biochemistry 29:8085-8093(1990). 

[1169] 423 (PRPP synt) Phosphoribosyl pyrophosphate synthetase signature 

Phosphoribosyl pyrophosphate synthetase (EC 2.7.6. 1 ) (PRPP synthetase) catalyzes the formation of PRPP from ATP 
and ribose 5-phosphate. PRPP is then used in various biosynthetic pathways, as for example in the formation of purines 
pyrimidines, histidine and tryptophan. PRPP synthetase requires inorganic phosphate and magnesium ions for its 
stability and activity. 

In mammals, three isozymes of PRPP synthetase are found; in yeast there are at least four isozymes. 
As a signature pattem for this enzyme, a very conserved region was selected that has been suggested to be involved 
in binding divalent cations [1]. This region contains two consen/ed aspartic acid residues as well as a histidine. which 
25 are all potential ligands for a cation such as magnesium. 

[1170] Consensus pattem: D-[LI]-H-[SA]-x-Q-[IMST]-[QMl-G-[FY]-F-x(2)-P-[LIVMFC]-D 

[1171] [1] Bower S.G.. Harlow K.W., Switzer R.L., Hoven-Jensen B. J. Biol. Chem. 264:10287-10291(1989) 

[1172] 424. (PRTP) Herpesvirus processing and transport protein 

The members of this family are associate with capsid intermediates during packaging of the virus 
50 Number of members: 31 

[1] 

Medline: 98362148 

Herpes simplex virus type 1 cleavage and packaging proteins 
UL1 5 and UL28 are associated with B but not C capsids during 
35 packaging. Yu D, Weller SK; 

J Virol 1998;72:7428-7439. 
[1173] 425. Photosystem I psaG / psaK (PSI PSAK) proteins signature 

Photosystem I (PSI) [1J is an integral membrane protein complex that uses light energy to mediate electron transfer 
from plastocyanin to ferredoxin. It is found in the chloroplasts of plants and cyanobacteria. PSI is composed of at least 
1 4 different subunits. two of which PSI-G (gene psaG) and PSI-K (gene psaK) are small hydrophobic proteins of about 
7 to 9 Kd and evolutionary related [2]. Both seem to contain two transmembrane regions. Cyanobacteria seem to 
encode only for PSl-K. 

[1174] As a signature pattem. the best-conserved region was selected which seems to correspond to the second 
transmembrane region. 
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- Consensus pattern: [GT]-F-x-fLIVM]-x-[DEA]-x(2)-[GA]-x-[GTA]-[SA]-x-G-H-x-[LIVM]-{GAl 
[1] Golbeck J.H. Blochim. Biophys. Acta 895:167-204(1987). 

[2] Kjaerulff S.. Andersen B., Nielsen VS.. Moller B.L., Okkels J.S. J. BioL Chem. 268:18912-18916(1993). 
[1175] 426. PTR2 family proton/oligopeptide symporters signatures 

A family of eukaryotic and prokaryotic proteins that seem to be mainly involved in the intake of small peptides with the 
concomitant uptake of a proton has been recently characterized [1 ,2]. Proteins that belong to this family are- - Funoal 
peptide transporter PTR2. 

Mammalian intestine proton<fependent oligopeptide transporter PeptTI . 
Mammalian kidney proton-dependent oligopeptide transfwrter PeptT2. 
Drosophila optl. 
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' NTRlT^^'^ ^^''^^ ^^^^"^^ transporters PTR2-A and PTR2.B (also known as the histidine transporting protein 

- Arabidopsis thaliana proton-dependent nitrate/chlorate transporter C HL1 . 

- Lactococcus proton-dependent di- and tri-peptide transporter dtpT. 

- Caenorhabditis elegans hypothetical protein C06G8.2, 

- Caenorhabditis elegans hypothetical protein F56F4.5. 
Caenorhabditis elegans hypothetical protein K04E7.2. 

- Escherichia coli hypothetical protein ybgH. 

- Escherichia coli hypothetical protein ydgR. 
Escherichia coli hypothetical protein yhiP. 
Escherichia coli hypothetical protein yjdL. 

- Bacillus subtllis hypothetical protein yclR 

These integral membrane proteins are predicted to comprise twelve transmembrane regions. As signature patterns 

two olthebestconsen/ed regions were selected The first isaregion that includes the end of thesecondtransmembrane 
region, a cytoplasmic loop as well as the third transmembrane region. The second pattem corresponds to the core of 
the fifth transmembrane region. 

- Consensuspattem:[GA]-[GAS]-[LIVMFYWA].[LIVM]-[GAS]-D.x-[LIVMFYMTl-[LIV 
[GSTAV]-x-[LIVIVIF]-x(3)-[GA] ^ ^ ^ ' 

- Consensus pattem: [FYT]-x(2).[LMFY].[FY\^-[LIVMFYWA]-x-[IVG]-N-[LIVMAG]-G-[GSAHLIMF] 

[ 1] Paulsen I.T., Skurray R.A. Trends Biochem. ScL 19:404-404(1994). 
[2]SteinerH.-Y. NaiderF. Becker J.M. Mol. Microbiol. 16:825-834(1995). 

[1176] 427. Pumilio-family RNA binding domains (aka PUM-HD. Pumilio homology domain) 
Puf domains are necessary and sufficient for sequence specific 

RNA binding in fly Pumilio and womn FBF-1 and FBF-2. Both proteins function as translational repressors in early 
embryonic development by binding sequences in the 3' UTR of target mRNAs (e.g. the nanos response element (NRE) 
in fly Hunchback mRNA. or the point mutation element (PME) in worm fem-3 mRNA). Other proteins that contain Puf 
domains are also plausible RNA binding proteins. JSNI.YEAST for instance, appears to also contain a single RRM 
domain by HMM analysis. 

Puf domains usually occur as a tandem repeat of 8 domains. 

The Pfam model does not necessarily recognize all 8 domains in all sequences; some sequences appear to have 5 
or 6 domains on initial analysis, but further analysis suggests the presence of additional divergent domains 
[1177] [1] Zhang B, Gallegos M, Puoti A, Durkin E. Fields S, Kimble J, 

Wickens MP Nature 1997;390:477-484. [2] Zamore PD, Williamson JR.'lehmann R. RNA 1997-3- 1421 -1433 

[1 1 78] 428. PWWP domain. The PWWP domain is named after a conserved Pro-Trp-Trp-Pro r^tif. The function of 

the domain is currently unknown. Number of members: 19 

[1179] [1] Medline: 98282232. WHSC1, a 90 kb SET domain-containing gene, expressed in early development and 
homologous to a Drosophila dysmorphy gene maps in the Wblf-Hirschhom syndrome critical region and is fused to 
IgH in t(4:14) multiple myeloma. Stec I. Wright TJ, van Ommen GJB. de Boer PAJ. van Haeringen A. Moorman AFM 
Altherr MR, den Dunnen JT; Hum Mol Genet 1998;7:1071-1082 
[1180] 429. PX domain 

Eukaryotic domain of unknown function present in phox proteins, PLD isoforms, a PI3K isoform 
Number of members: 71 
[1] 

Medline: 97084820 

Novel donnains in NADPH oxidase subunits, sorting nexins, and 
Ptdlns 3-kinases: binding partners of SH3 donnains? 
Ponting CP; 
Protein Sci 1996;5:2353-2357. 
[1181] 430. ParA family ATPase 
[1] 

Medline: 91141297 

A family of ATPases involved in active partitioning of diverse bacterial plasmids. 
Motallebi-Veshareh M, Rouch DA, Thomas CM; 
Mol Microbiol 1990;4:1455-1463. 



171 



EP 1 033 405 A2 

Number of members: 122 

[1 1 82] 431 . (Paivo coat) Parvovirus coat protein, 72 members. 
[1183] 432. Pectinesterase signatures 

oirnlir^nS!! ^ ' I 'V ^^*y^^^ *e hydrobrsis of pectin into pectate and methanol In 

pit; ^ r "^T^T ^" "'^^abclism during fruit ripening. In pbnt bacterial pathogens such as 

soZ«i™trsur ""'"^"^ "'^'^ " Aspergillusniger. pectinesterase is invo.ed'^in mLrationl'd^ 

^^^^ ^ °' ^^-^"^^^ ^"""^''•^ °' -9^^ 

HI! ■ "fr^* ? °' " « <=°"served tyrosine which may 

located in the central part of these enzymes. laHOHuue 

- Consensus pattern: [GSTNP].x(6)-[FYVHRHIVN]-[KEP]-x.G4STIVKRQ 

- Consensus pattern: [IV]-x-G-[STAD]-[LIVT]-D.(FYIHIV]-[FSN]-G J i J l^^vivaj 

[ 1] Ray J.. Knapp J., Grierson D.. Bird C, Schuch W. Eur. J. Biochem 174- 11 9-1 24(1 988) 

[ 2] Plastow G.S. Mol. Microbiol. 2:247-254(1 988). 

[ 3] Markovic O., Joernvall H. Protein Sci. 1:1288-1292(1992). 

[1184] 433. Pentapeptide repeats (8 copies) 

These repeats are found in many cyanobacterial proteins. 

The repeats were first identified in hglK [1]. The function of these repeats is unknown 
The structure of this repeat has been predicted to be a beta-helix [2] 

The repeat can be approximately described as A(D/N)LXX, where X can be any amino acid.Number of members: 75 
IVIedline: 96062225 

The hglK gene is required for localization of heterocyst-speclfic glycolipids in the cyanobacterium 
Anabaena sp. strain PCC 71 20. 
Black K, Buikema WJ, Haseikorn R; 

J Bacterlol 1995;177:6440-6448. 

[2]Medline: 98318059 
Structure and distribution of pentapeptide repeats in bacteria. 
Bateman A, Murzin A. Teichmann SA; 

Protein Sci 1998;7:1477-1480. 

[3]Medline: 98316713 

?^!^Tp^ns°' ^ ^^"^^'^ '^^''^ ^"""^"^ ^ '*'y^^^°'^ P-^t^in related to a novel "pentapeptide repeat" 

Kieselbach T, Mant A, Robinson C. Schroder WP; 
FEBS Lett 1 998;428:241 -244. 
[1 1 85] 434. Polypeptide def ormy lase 
[1] 

Medline: 97002011 

J Mol Biol 1996;262:375-386. 

[2]Medline: 98332750 
Solution structure of nickel-peptide deformylase. 
Dardel F. Ragusa S. t^ennec C. Blanquet S, Meinnel T; 

J Mol Biol 1 998;280:501 -51 3 
Number of members: 21 

[1186] 435. Peptidyl-tRNA hydrolase signatures 

PeptidyMRNA (EC 3.1.1.29) (PTH) is a bacterial enzyme that cleaves peptidyl-tRNA or N-acyl-aminoacyl- 

tZ^rh 2 nTZ""' ^^^^'-^^'"^ ^^'^^ natural substrate ?orYhis enzyme may^T^S- 

S^L^ ?H '^""''^ 'y"'^"'*" '^1 ^e^" ^ound [2,3] to be evolutionary 

related to yeast hypothetical protein YHR189w. «vuiuiiunciry 

selected that each contain an histidine. The first of these regions is located in the N-temiinal sectbn, the other in me 
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central part. 

- Consensus pattern: [FY]-x(2)-T-R-H-N-x-G-x(2)-[LIVMFA](2)-{DE] 

- Consensus pattern: [GS]-x(3)-H-N-G-[LIVMHKRHDNSHLIVMT] 

[ 1] Garcia-Villegas M.R., De l_a Vega RM.. Galindo J.M„ Segura M.. Buckingham R.H., Guarneros G, EMBO J. 
10:3549-3555(1991). 

[ 2] De La Vega F.M.. Galindo JM., Old I.G., Guarneros G. Gene 169:97-100(1996). 
( 3] Ouzounis C, Bork R, Casarl G„ Sander C. Protein Sci. 4:2424-2428(1995). 



[1187] 436. (Peptidase Ml 7) Gytosol anninopeptldase signature 

Cytosol aminopeptidase is a eukaryotic cytosolic zinc-dependent exopeptldase that catalyzes the removal of unsub- 
stituted amino-acid residues from the N-terminus of proteins. This enzyme Is often known as leucine aminopeptidase 
(EC 3.4.11,1) (LAP) but has been shown [1] to be identical with prolyl aminopeptidase (EC 3.4.11.5). Cytosol ami- 
is nopeptldase is a hexamer of identical chains, each of which binds two zinc ions. 

Cytosol aminopeptidase is highly similar to Escherichia coli pepA, a manganese dependent aminopeptidase. Residues 
involved in zinc Ion-binding [2] in the mammalian enzyme are absolutely conserved in pepA where they presumably 
bind manganese. 

A cytosol aminopeptidase from Rickettsia prowazekii [3] and one from Arabidopsis thaliana also belong to this family 
20 As a signature pattern for these enzymes, a perfectly conserved octapeptide was selected which contains two residues 
involved in binding metal ions: an aspartate and a glutamate. 

- Consensus pattern: N-T-D-A-E-G-R-L [The D and the E are zinc/manganese ligands] 
Note: these proteins belong to family M17 in the classification of peptidases [4,E1J. 

25 

[ 1] Matsushima M.. Takahashi T, Ichinose M., Miki K., Kurokawa K., Takahashi K. Biochem. Biophys. Res. Com- 
mun. 178:1459-1464(1991). 

[ 2] Burley S.K., David RR., Sweet R.M., Taylor A.. Lipscomb W.N. J. Mol. Biol. 224:113-140(1992). 
[ 3] Wood D.O., Solomon M.J., Speed R.R. J. Bacterial. 175:159-165(1993). 
30 [ 4] Rawlings N.D., Barrett A J. Meth. Enzymol. 248:183-228(1 995). 

[1188] 437. Assemblin (Peptidase family S21 ) 
[1] 

Medline: 96399137 
35 Three-dimensional structure of human cytomegalovirus protease. 

Shieh HS, Kurumbail RG Stevens AM, Stegeman RA, Sturman EJ, 
Pak JY, Wittwer AJ, Palmier MO. Wiegand RC, Holwerda BC. 
Stallings WC; 
Nature 1996:383:279-282. 
40 Number of members: 29 

[1189] 438. Pollen proteins Ole e I family signature 

The following plant pollen proteins, whose biological function is not yet known, are structurally related [1): 

Olive tree pollen major allergen (Ole e I). 
45 . Tomato anther-specific protein LAT52. - Maize pollen-specific protein ZmC13. These proteins are most probably 
secreted and consist of about 145 residues. As shown in the following schematic representation, there are six 
cysteines which are conserved in the sequence of these proteins. They seem to be involved in disulfide bonds. 



xxxxxxCxCxxxxxxxxxCxxxxxxxxxxxxxxxxxCxxxxxCxxxxxxxxxxxxxxxxxxxxCxxxxxxx 
******'C': conserved cysteine involved in a disulfide bond, 
position of the pattern. 

Consensus pattern: [EQ]-G-x-V-Y-C-D-T-C-R [The two C's are prc^bly involved in disulfide bonds] 
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[11^] [ 1] Vinalba M Batanero E.. LopezOtin C. Sanchez LM.. Monsafve R.I.. Gonzalez De La Pena M A Lahoz 
C, RcxJnguez R Eur J. Biochenr). 216:863^69(1993). " 
[1191] 439. Pollen allergen 

This family contains allergens lol PI, Pll and PHI from Lolium perenne. 
Number of members: 49 
[11 

Medline: 90105394 

S p llTSencer' °* ^ ^'^""^ ^ '"^ 

Ansari AA, Shenbagamurthi P, Marsh DG; 
Biochemistry 1989;28;8665-8670. 
[1 1 92] 440. Porphobilinogen deaminase cofactor-binding site 

Porphobilinogen deaminase (EC 4.3.1 .8). or hydroxymethylbilane synthase, is an enzyme involved in the biosynthesis 

rroTrmX^s^^^ 

Tn.TSa^^!' ^ dipyrromethane cofactor to which the PBG subunits are added in a stepwise fashion 

rviSn??h '"T!- ^'"^^^ •'^^^ '° bo""d by the sutf ur atomTa 

erCtic Ircr ' ' " porphobilinogen deaminases from various prokaryotic and 



Consensus pattern: E-R.x-(LIVMFA]-x(3)-[LIVMF]-x-G-(GSA]-C-x-[IVT]-P-[LIVIVIF] 



-[GSA] [C is the cofactor attachment site] 

[1193J [ 1] Miller A.D.. Hart G.J.. Packman LC. Battersby A.R. Biochem. J. 254:915-918(1988) 
[1194] 441. Presenilin ' 

Nulber S [4]- This family also contains SPE proteins from C.elegans. 

[1] 

Medline: 98045995 

Presenilins and Alzheimer's disease. 

Kim TW, Tanzi RE; 

Curr Opin Neurobiol 1997; 7; 683-688. 

[2]Medline: 98045995 
Presenilins and Alzheimer's disease. 
Kim TW, Tanzi RE; 

Curr Opin Neurobiol 1 997;7:683-6B8. 

I3]Medline: 98099802 
Interaction of presenilins with the filamin family of actin-binding proteins 
Zhang W. Han SW, McKeel DW. Goate A, Wu JY; 

J Neurosci 1 998; 1 8:91 4-922. 

[4]Medline: 99004850 

Destabilisation of beta^atenin by mutations in presenilin-1 potentiates neuronal apoptosis 

Zhang Z. Hartmann H, Do VM. Abramowski D. Sturchler-Pierrat 
C, Staufenbiel M, Sommer B, van de Wetering M, Clevers H, 
Sattig R De Strooper B. He X, Yankner BA; 

Nature 1998;395:698-702. 
[1 1 95] 442. (Pribosyltran) Purine/pyrimidine phosphorlbosyl transferases signature 

Phosphoribosyltransferases (PRT) are enzymes that catalyze the synthesis of beta-n-S'-monophosphates from phos- 

phonbosylpyrophosphate (PRPP) and an enzyme specific amine. A number of PRTs are invoL iS the 

of punne. pynm.d.ne, and pyridine nucleotides, or in the salvage of purines and pyrimidines. These enzyn^^^^^^^^^ 

- Adenine phosphoribosyltransferase (EC 2.4.2.7) (APRT). which is involved in purine salvage 

- Hypoxanthine-guanine or hypoxanthine phosphoribosyltransferase (EC 2.4.2.8) (HGPRT or HPRT) which are 
involved in purine salvage. " aio 

■ ?l°If ^ phosphoribosyltransferase (EC 2.4.2.10) (OPRT). which is involved in pyrimidine biosynthesis 

- Amido phosphonbosyltransferase (EG 2.4.2. 14), which is involved in purine biosynthesis 

- Xanthine-guanine phosphoribosyltransferase (EC 2.4.2.22) (XGPRT). which is involved in purine salvage 
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In the sequence of all these enzymes there is a small conserved region which may be involved in the enzymatic activity 
and/or be part of the PRPP binding site [1 ]. 

- Consensus pattern: [LIVMFmCTAJ-[LI\^]^LIVMA]-[LIVMFC]-[DE]-D-[LIVMS]-[LIVMHSTAVD]-[STARHGAC]- 

x-[STAR] 

- Note: in position 11 of the pattern most of these enzymes have Gly. 

[1196] [1] Hershey H.V.. Taylor M.W. Gene 43:287-293(1986). 
[1197] 443. (Pro CA) 

Prokaryotic-type carbonic anhydrases signatures 

Car1x»nic anhydrases (EC 4.2.1.1) (CA) are zinc metalloenzymes which catalyze the reversible hydration of carbon 
droxcde. In Eschenchia coli. CA (gene cynT) is involved in recycling carbon dioxide formed in the bicarbonate-dependent 
decomposition of cyanale by cyanase (gene cynS). By this action, it prevents the depletion of cellular bicarbonate [1 1 
In photosynthetic bacteria and plant chloroplast, CA is essential to inorganic carbon fixation [2]. Prokaryotic and plant 
chloroplast CA are structurally and evolutionary related and form a family distinct from the one which groups the many 
different forms of eukaryotic CA's (see <PDOC00146>). Hypothetical proteins yadF from Escherichia coli and HI1301 
from Haemophilus influenzae also belong to this family. Two signature patterns were developed for this family of en- 
zymes. Both patterns contain conserved residues that could be involved in binding zinc (cysteine and histidine). 

- Consensus pattem: C-[SA]-D-S-R-[LIVM]-x-[AP] 

- Consensus pattem: (EQ]-Y-A-[LIVM]-x(2)-[LIVM)-x(4)-[LIVMF](3)-x-G-H-x(2)-C-G 

[ 1] Guilloton M B.. Korte J.J.. Lamblin A.F., Fuchs J. A. Anderson RM. J. Biol. Chem. 267:3731-3734(1992) 
[ 2] Fukuzawa H., Suzuki E.. Komukai Y. Miyachi S. Proc. Natl. Acad. Sci. U.S.A. 89:4437-4441 (1992). 

[1198] 444. (ProlyLoligopep) 

Prolyl oligopeptidase family serine active site 

[1 1 99] The prolyl oligopeptidase family [1 ,2.3] consist of a number of evolutionary related peptidases whose catalytic 
activity seems to be provided by a charge relay system similar to that of the trypsin family of serine proteases but 
which evolved by independent convergent evolution. The known members of this family are listed below 

- Prolyl endopeptldase (EC 3.4.21 .26) (PE) (also called post-proline cleaving enzyme). PE is an enzyme that cleaves 
peptide bonds on the C-terminal side of prolyl residues. The sequence of PE has been obtained from a mammalian 
species (pig) and from bacteria (Flavobacterium meningosepticum and Aeromonas hydrophila); there is a high 
degree of sequence conservation between these sequences. 

- Escherichia coli protease II (EC 3.4.21 .83) (oligopeptidase B) (gene prtB) which cleaves peptide bonds on the C- 
terminal side of lysyl and argininyl residues. 

- Dipeptidyl peptidase IV (EC 3.4.1 4.5) (DPP IV). DPP IV is an enzyme that removes N-terminal dipeptides sequen- 
tially from polypeptides having unsubstituted N-termini provided that the penultimate residue is proline 

- Yeast vacuolar dipeptidyl aminopeptidase A (DPAP A) (gene: STE1 3) which is responsible for the proteolytic mat- 
uration of the alpha-factor precursor 

- Yeast vacuolar dipeptidyl aminopeptidase B (DPAP B) (gene: DAP2). 

- Acylamino-acid-releasing enzyme (EC 3.4. 1 9. 1 ) (acyl-peptide hydrolase). This enzyme cata Vzes the hydrolysis 
of the ammo-terminal peptide bond of an N-acetylated protein to generate a N-acetylated amino acid and a protein 
with a free amino-terminus. 



[1200] A conserved serine residue has experimentally been shown (in E.coli proteasell as well as in pig and bacterial 
PE) to be necessary for the catalytic mechanism. This serine, whfch is part of the catalytic triad (Ser His Asp) is 
generally located about 1 50 residues away from the C-temiinal extremity of these enzymes (which are all proteins that 
contains about 700 to 800 amino acids). 

[1 201] Consensus pattem: D-x(3)-A-x(3)-[LI VMFYW]-x(1 4)-G-x-S-x-G-G-[U VMFYW](2) (S is the active site residue! 
Sequences known to betong to this class detected by the pattern ALL, except for yeast DPAP A. 
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[1202] Note: these proteins belong to families S9A/S9B/S9C in the classification of peptidases [4]. 

[ 1] Rawlings N.D., Polgar L. Barrett AJ. Biochem. J. 279:907-911(1991) 
[ 2] Barrett AJ.. Rawlings N.D. 
[3] Polgar L, SzaboE. 

[ 4] Rawlings N.D., Barrett A. J. Meth. Enzymol. 244:19-61(1994). 

[1203] 445. (Pterin 4a) 

Pterin 4 alpha carbinolamine dehydratase 

[1204] Rerin 4 alpha carbinolamine dehydratase is aka DCoH (dimerisatlon cofactor of hepatocyte nuclear factor 
1 -alpha). 

[1205] Number of members: 11 

[120$] [1] Cronk JD, Endrizzi JA, Alber T; Medline: 97052967 High -resolution structures of the bifunctional enzyme 
and transcnptional coactivator DCoH and its complex with a product analogue " Protein Scl 1996-5-1 963-1 Q7P 
[1207] 446. (Pyridox oxidase) 
Pyridoxamine 5'-phosphate oxidase signature 

[1208] Pyridoxamine 5'-phosphate oxidase (EC 1 .4.3.5) is a FMN flavoprotein involved in the de novo synthesis of 
pyridoxine (vitamin B6) and pyridoxal phosphate. It oxidizes pyridoxamine-5-P (PMP) and pyridoxine-5-P (PNP) to 
pyndoxal-5.p The sequences of the enzyme from bacterial (genes pdxH or fprA) [1 ] and fungal (gene PDX3) [2] sources 
show that this protein has been highly conserved throughout evolution. 

PdxH Is evolutionary related [3] to one of the enzymes in the phenazine biosynthesis protein pathway phzD (also 
known as phzG). As a signature pattern, a highly consen/ed region was selected located in the C-terminal part of these 
enzymes. ^ 

- Consensus pattern: [LIVFI-E-F-W-[QHG]-x(4)-R.[LIVM]-H-[DNE]-R 

[ 1] Lam H.-M.. Winkler M.E. J. Bacteriol. 174:6033-6045(1992). 

[ 2] LoubbardI A., Karst P., Guilloton M., Marcireau C. J. Bacteriol. 177:1817-1823(1995). 

[ 3] Pierson LS. III. Gaffney T, Um S., Gong F. FEMS Microbiol. Lett. 134:299-307(1995). 

[1209] 447. (Pyrophosphatase) 
Inorganic pyrophosphatase signature 

[1210] Inorganic pyrophosphatase (EC 3.6.1.1) (PPase) [1.2] is the enzyme responsible for the hydrolysis of pyro- 
phosphate (PPi) which is formed principally as the product of the many biosynthetic reactions that utilize ATP All known 
Ppases require the presence of divalent metal cations, with magnesium conferring the highest activity. Among other 
residues, a lysine has been postulated to be part or close to the active site. PPases have been sequenced from bacteria 
such as Escherichia coli (homohexamer). thermophilic bacteria PS-3 and Thermus thermophilus. from the archaebac- 
teria Themnoplasma acidophilum. from fungi (homodlmer). from a plant, and from bovine retina. In yeast a mitochon- 
drial isoform of PPase has been characterized which seems to be Involved in energy production and whose activity Is 
stimulated by uncouplers of ATP synthesis. 

[1211] The sequences of PPases share some regions of similarities. As signature patterns a region was selected 
that contains three conserved aspartates that are Involved in the binding of cations. 

- Consensus pattern: D-[SGDN]-D-[PE]-[L1VMF]-D-(LIVMGAC] 
[The three D's bind divalent metal cations] 

[ 1] Lahti R.. Kolakowski LF. Jr., Helnonen J.. Vihinen M.. Pohjanoksa K., Cooperman B.S. Biochim. Biophys Acta 
1038:338-345(1990). ^ ^ 

[ 2] Cooperman B.S., Baykov A.A., Lahti R. Trends Biochem. Scl. 17:262-266(1992). 

[1 21 2] 448. (Peptidase S26) 
Signal peptidases I signatures, 

[1213] Signal peptidases (SPases) [1] (aka leader peptidases) ren^ve the signal peptides from secretory proteins 
In prokaryotes three types of SPasesare known: type I (gene lepB) which is responsible for the processing of the 
majority of exported pre-proteins; type II (gene Isp) which only process lipoproteins, and a third type involved in the 
processing of pili subunrts. SPase I (EC 3.4.21.89) is an integral membrane protein that is anchored in the cytoplasmic 
membrane by one (in B. subtilis) or two (in E. coli) N4erminal transmembrane domains with the main part of the protein 
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protuding in the periplasmic spaco. Tvwj residues have been shown [2.3J to be essential for the catalytic activity of 
SPase I: a serine and ari lysine. SPase I is evolutionary related to the yeast mitochondrial inner membrane protease 
subunit 1 and 2 (genes IMP1 and IMP2) which catalyze the removal of signal peptides required for the targeting of 
proteins from the mitochondrial matrix, across the inner membrane, into the inter-membrane space [41 In eukarvotes 
the removal of signal peptides is effected by an oligomeric enzymatic complex composed of at least five subunits- the 
signal pept.(tese complex (SPC). The SPC is located in the endoplasmic reticulum membrane. Two components of 
mammahan SPC. the 18 Kd (SPC18) and the 21 Kd (SPC21 ) subunits as well as the yeast SEC11 subunit have been 
Shown [5] to share regions of sequence similarity with prokaiyotic SPases I and yeast IMP1/IMP2 Three siqnature 
patterns have been developed for these proteins. The first signature contains the putative active site serine, the second 
signature contains the putative active site lysine which is not conserved in the SPC subunits. and the third siqnature 
corresponds to a consented region of unknown biological significance which is located in the C-tenninal section of all 
these proteins. 

[1214] Consensus pattern: [GS]-x-S-M-x-[PS]-[AT]-[LF] [S is an active site residuel- 

Consensus pattern: K-R-(UVMSTA](2)-G-x-[PG]-G-[DE]-x-tLIVM]-x-fLIWIFY] [K is an active site residuel- 

Consensus pattern: [LIVMFYW](2)-x(2)-G-D-[NH]-x(3)-[SND]-x(2)-[SG]- 

oiJlo^'"®^ ° '^'^"'^^ Biochem. Sci. 1 7:474-478(1 992).( 2] Sung M., Dalbey R E J Biol 

Chem. 267:13154-13159(1992).[ 3) Black M.T J. Bacteriol. 1 75:4957-4961 (1993).[ 4] Nunnari j Fox TD Walter P 
!o?r,Lnf ^'^^^'^"^''^^^^ t - ^« J°"9 A.. Vehmaanpera J., Venema G., Bron S. EMBO J IV 

281 9-2828(1 992).( 6] RawlingsN.D.. Barrett A.J. Meth. Enzymol. 244:19-61(1994) [El] 

[1216] 449. (Peptidase CI ) Eukaiyotic thiol (cysteine) proteases active sites. Eukaryotic thiol proteases (EC 3 4 22 -) 
[1 are a family of proteolytic enzymes which contain an active site cysteine. Cata^sis proceeds through a thioeste 
intermediate and is facilitated by a nearby histidine side chain; an asparagine completes the essential catalytic triad 
The proteases which are currently known to belong to this family are listed below (references are only provided for 
recently detemiined sequences). - Vertebrate lysosomal cathepsins B (EC 3 4 22 1) H (EC 34 22 1 6) L (EC 
MMJl). and S (EC MJ2^) (2]. - Vertebrate lysosomal dipeptidyl peptidase I (EC 3.4.'l4.1 ) (als^i^as cathe- 

both a N-terminal catalytic domain and a C-terminal calcium-binding domain. - Mammalian cathepsin K. which seems 
involved in osteoclastic bone resorptfon [3]. - Human cathepsin O [4]. - Bleomycin hydrolase. An enzyme that catalyzes 
he inactivation of the antitumor drug BLM (a glycopeptide). - Plant enzymes: barley aleurain (EC 3 4 22 1 6) EP-B1/B4- 
kidney bean EP ci. rice bean SH-EP; kiwi fruit actinidin (EC 3.4.22.14 ): papaya latex papain jEC3t2g2). chymo^ 
papain (EC MiiJ). caricain (ECM22J0). and proteinase IV (EC 3.4.22.25 ): pea turgor.responli;;7^otein ISA- 
l^lS? kTi r COT44; rice oiyzain alpha, beta, and gaoL; tomato low-temper Jtu/e 
,nc^^uced.ArabdopsisthalianaA494.RD19A and RD21A.-HouseKlust mites allergens DerPI and EurM1.-Ca^^^^^^ 
B-like proteinases from the worms Caenorhabditis elegans (genes gcp-1 . cpr-3, cpr-4. cpr-5 and cpr-6). Schistosoma 
mansoni (antigen SM31 ) and Japonica (antigen SJ31 ). Haemonchus contortus (genes AC-1 and AC-2), and Ostertaqia 
ostertagi (CP-1 and CP-3). - Slime mold cysteine proteinases CP1 and CP2. - Cruzipain from Trypanosoma cruzi and 
brucei. - Throphozoite cysteine proteinase (TCP) from various Plasmodium species. - Proteases from Leishmania 
mexicana. Theileria annulata and Theileria parva. - Baculoviruses cathepsin-like enzyme (v^th). - Drosophila small 

TH^ZV^nT^ Tlf. '^'^'"^ ^ calpain-like domain. - Yeast thiol protease 

BLH1/YCP1/LAP3. - Caenorhabditis elegans hypothetical protein C06G4.2, a calpain-like protein. Two bacterial pepti- 
dases are also part of this family: - Aminopeptidase C from Lactococcus lactis (gene pepC) [51 - Thtol protease tpr 
from Porphyromonas gingivalis. Three other proteins are structurally related to this family, but may have lost their 
proteolytic activity. - Soybean oil body protein P34. This protein has its active site cysteine replaced by a glycine - Rat 
testin. a Sertoli cell secretory protein highly similar to cathepsin L but with the active site cysteine is replaced by a 
serine. Rat testin shouW not be confused with mouse testin which is a LIM-domain protein (see <PDOC00382>) - 
Plasmodium falciparum serine-repeat protein (SERA), the major bkxxJ stage antigen. This protein of 111 Kd possesses 
a C-temiinal thiol-protease-like domain [6], but the active site cysteine is replaced by a serine. The sequences around 
the three active site residues are well consented and can be used as signature patterns 

[12171 Consensus pattern: Q-x(3)-(GE]-x-C-[YW]-x(2)-[STAGC]-[STAGCV] [C is the active site residue]- Note- the 
residue in position 4 of the pattern is almost always cysteine; the only exceptions are calpains (Leu). bl«)mycin hy- 

Se]-^"'°"^"' f'-'^'^^^^'^'^J-''-"-f'^S^C^ML'VM]-x-[LIVtyiAT](2)-G-x-(GSADNH] [H is the active site 

Consensus pattem:(^FYCH]-(WI]-[LIVT]-x-(KRQAGl-N-[ST]-W-x(3).[FYW)-G.x(2^ 

?fpe^SrsVs(?.S- 

[1219] [ 1] Dufour E. Biochimie 70:1335-1342(1988).[ 2] Kirschke H.. Barrett A.J.. Rawlings N.D. Protein Prof. 2: 
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1 587-1 643(1 995).[ 3] Shi G.-R, Chapman H.A., Bhairi S.M.. Deleeuw C, Reddy VY., Weiss S.J. FEBS Lett 357* 
129-134(1995)14) Velasco G.. Ferrando A.A.. Puente XS.. Sanchez LM.. Lopez-ain C. J. BioL Chem 269* 
27136-27142(1 994).[ 5] Chapot-Chartier M.R. Nardi M., Chopin M.C.. Chopin A.. Gripon J.C. Appl. Environ. Microbiol 
59:330-333(1 993).[ 6] Higgins D.G.. McConnell DJ., Sharp RM. Nature 340:604-604(1 989). [ 7] Rawlings N D Barrett 
A. J. Meth. Enzymol. 244:461-486(1994). 

[1220] 450. (peptidase M24) Aminopeptidase P and proline dipeptidase signature (1 ). 

Aminopeptldase P (EC 3.4.11.9) is the enzyme responsible for the release of any N-terminal amino acid adjacent to a 
proline residue. Proline dipeptidase (EC 3.4.13.9 ) (prolidase) splits dipeptides with a prolyl residue in the carboxyl 
terminal position. Bacterial aminopeptidase P II (gene pepP) [1 ], proline dipeptidase (gene pepQ)[2]. and human proline 
dipeptidase (gene PEPD) [3] are evolutionary related. These proteins are manganese metalloenzymes. Yeast hypo- 
thetical proteins YER078c and YFR006w and Mycobacterium tuberculosis hypothetical protein MtCY49.29c also be- 
long to this family. As a signature pattern for these enzymes a conserved region that contains three histidine residues 
has been developed 

[1221] Consensus pattem: [HA]-[GSYRHLIVMT]-[SG]-H-x-[LIV]-G-[LIVM]-x-[IV]-H.[DE]- 

[1222] [ 1] Yoshimoto T, Tone H., Honda T. Osatomi K., Kobayashi R., Tsuru D. J. Biochem. 105:412-416(1989) 
[ 2] Nakahigashi K.. Inokuchi H. Nucleic Acids Res. 18:6439-6439(1 990). [ 3] Endo R. Tanoue A.. Nakai H , Hata A 
Indo Y, Titani K., Matsuda L J. BioL Chem. 264:4476-4481 (1989).[ 4] Rawlings N.D., Barrett AJ. Meth Enzymol 248* 
183-228(1995). 

[1223] Methionine aminopeptidase signatures. (2). Methionine aminopeptidase (EC 3.4.11.18 ) (MAP) is responsible 
for the removal of the amino-terminal (initiator) methionine from nascent eukaryotic cytosolic and cytoplasmic prokary- 
otic proteins if the penultimate amino acid is small and uncharged. All MAP studied to date are monomeric proteins 
that require cobalt ions for activity. Two subfamilies of MAP enzymes are known to exist [1,2]. While being evolutionary 
related, they only share a limited amount of sequence similarity mostly clustered around the residues shown, in the 
Escherichia coli MAP [3].to be involved in cobalt-binding. The first family consists of enzymes from prokaryotes as well 
as eukaryotlcMAP-1, while the second group is made up of archebacterial MAP and eukaryoticMAP-2. The second 
subfamily also includes proteins which do not seem to be MAR but that are clearly evolutionary related such as mouse 
proliferation-associated protein 1 and fission yeast curved DNA-binding protein. For each of these subfamilies, a spe- 
cific signature pattem that includes residues known to be involved in colbalt^inding has been developed 
[1224] Consensus pattem: [MFY]-x-G-H-G-[LI VMC]-[GSH]-x(3).H-x(4)-[LI VM].x-[HN]-[YWV] [H is a cobalt ligandl- 
Consensus pattem: [DA]-[LI VMY]-x-K-[LIVM]-D.x-G-x-[HQ]-[LIVM]-[DNS]-G-x(3)-[DN] [The second D and the last D/ 
N are cobalt ligands] 

[1225] [ 1] Arfin S.M., Kendall R.L. Hall L., Weaver LH.. Stewart A.E., Matthews B.W., Bradshaw R A Proc Natl 
Acad. Sci. U.S. A 92:771 4-771 8(1 995). [ 2] Keeling RJ.. Doollttle W.R Trends Biochem, Sci. 21:285-286(1996) [ 3] 
Roderick S.L, Mathews B.W. Biochemistry 32:3907-391 2(1 993).[ 4] Rawlings N.D., Barrett A J Meth Enzvmol 248" 
183-228(1995). • / • • 

[1226] 451 . Cytochrome P450 cysteine heme-iron ligand signature 

Cytochrome P450's [1 .2,3,E1] are a group of enzymes Involved in the oxidative metabolism of a high number of natural 
compounds (such as steroids, fatty acids, prostaglandins, leukotrienes, etc) as well as drugs, carcinogens and muta- 
gens. Based on sequence similarities, P450's have been classified into about forty different families [4.5]. P450's are 
proteins of 400 to 530 amino acids; the only exception is Bacillus BM-3 (CYR102) which is a protein of 1048residues 
that contains a N-terminal P450 domain followed by a reductase domain. R450's are heme proteins. A conserved 
cysteine residue in the C-terminal part of P450's is involved in binding the heme Iron in the fifth coordination site. From 
a region around this residue, a ten residue signature was developed specific to P450's. 
[1227] Consensus pattem: [FW]-[SGNH]-x-[GD]-x-[RHRT]-x-C-[LIVMFAP]-[GAD] [C is the heme iron ligand]- 

[ 1] Nebert D.W., Gonzalez FJ. Annu. Rev. Biochem. 56:945-993(1987). 

[ 2] Coon M.J., Ding X., Pernecky S.J., AD.N. FASEB J. 6:669-673(1992). 

[ 3] Guengerich FR J. Biol. Chem. 266:10019-10022(1991). 

[4] Nelson D.R., Kamataki T. Waxman D.J.. Guengerich FR, Estrabrook R.W., Feyereisen R., Gonzalez FJ , 
Coon M.J., Gunsalus I.C., Gotoh O., Okuda K., Nebert D.W. DNA Cell Biol. 12:1-51(1993). 
[5]DegtyarenkoK,N., Archakov Al. FEBS Lett. 332:1^8(1993). 

[1228] 452. (Pec Lyase) Pectate lyase 

This enzyme fonms a right handed beta helix structure. Pectate lyase is an enzyme involved in the maceration and soft 
rotting of plant tissue. 

[1229] [1] Yoder MD. Keen NT, Jumak F, Science 1993:260:1503-1507. 

[1230] 453. (pep M24) Aminopeptidase P and proline dipeptidase signature (pepi ) 

Aminopeptidase P (EC 3.4,11.9) is the enzyme responsible for the release of any N-terminal amino acid adjacent to a 
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proltne residue. Proline dipeptidase(EC 3.4. 13.9) (prolidase) splits dipeptides with a prolyl residue in the carboxyl 
terminal position. Bacterial aminopeptidase P II (gene pepP) (1], proline dipeptidase (gene pepQ)[2], and human proline 
dipeptidase (gene PEPD) [3] are evolutionary related. These proteins are manganese metailoenzymes. Yeast hypo- 
thetical proteins YER078c and YFROOSw and Mycobacterium tuberculosis .hypothetical protein MtCY49.29c also be- 
long to this family. As a signature pattem for these enzymes a conserved region was selected that contains three 
histidine residues. 

[1231] Consensus pattem: [HAHGSYRJ-[LIVIwrn-[SG]-H-x-[LIV]-G-{LIVM]-x-JIV]-H-[DEl- 

[ 1] Yoshimoto T., Tone H.. Honda T, Osatomi K.. Kobayashi R., Tsuru D. J. Biochem. 105:412-416(1989). 
[ 2] Nakahigashi K., Inokuchi H. Nucleic Acids Res. 18:6439-6439(1990). 

[ 3] Endo F., Tanoue A.. Nakai H.. Hata A., Indo Y. Titani K., Matsuda I. J. Biol. Chem. 264:4476-4481(1989) 
[ 4] Rawlings N.D.. Barrett A.J. Meth. Enzymol. 248:183-228(1995). 

[1232] Methionine aminopeptidase signatures (pep2) 

Methionine aminopeptidase (EC 3.4.11.18) (MAP) is responsible for the removal of the amino-temiinal (initiator) me- 
thionine from nascent eukaryotic cytosolic and cytoplasmic prokaryotic proteins if the penultimate amino acid is small 
and uncharged. All MAP studied to date are monomeric proteins that require cobalt ions for activity Two subfamilies 
of MAP enzymes are known to exist [1 .2]. While being evolutionary related, they only share a limited amount of se- 
quence similarity mostly clustered around the resWues shown, in the Escherichia coii MAP [3].to be involved in cobalt- 
binding. The first family consists of enzymes from prokaryotes as well as eukaryotic MAP-1. while the second group 
IS made up of archebactorial MAP and eukaryotic MAP.2. The second subfamily also includes proteins which do not 
seem to be MAP, but that are clearly evolutionary related such as mouse proliferation-associated protein 1 and fission 
yeast cun/ed DNA-binding protein. For each of these subfamilies, a specific signature pattem was developed that 
includes residues known to be involved in colbalt-binding. 

[1233] Consensus pattem: [MFYl-x-G-H-G-(UVMC]-[GSH]-x(3)-H-x(4)-[LIVM]-x-[HN]-[YWV] [H is a cobalt llqandl- 
Consensus pattem: [DA]-(LIVMY]-x-K-[LIVM]-D-x-G-x-[HQ]-(LIVM]-[DNS]-G-x(3)-[DN] [The second D and the last D/ 
N are cobalt ligands] 

[ 1] Arfin S.M.. Kendall R.L, Hall L., Weaver L.H., Stewart A.E., Matthews B.W., Bradshaw R A Proc Natl Acad 
Sci. U.S.A. 92:7714-7718(1995). 

[ 2] Keeling P.J., Doolittle W.F Trends Biochem. Sci. 21:285-286(1998). 
[ 3] Roderick S.L., Mathews B.W. Biochemistry 32:3907-3912(1993). 
[4] Rawlings N.D.. Barrett A.J. Meth. Enzymol. 248:183-228(1995). 

[1234] 454. Peroxidases signatures 

Peroxidases (EC 1 . 1 1 . 1 .-) (l ] are heme-binding enzymes that carry out a variety of biosynthetic and degradative func- 
tions using hydrogen peroxide as the electron acceptor Peroxidases are widely distributed throughout bacteria, fungi 
plants, and vertebrates. In peroxidases the heme prosthetc group is protoporphyrin IX and the fifth ligand of the heme 
iron IS a histidine (known as the proximal histidine). Another histidine residue (the distal histkJine) sen/es as an acid- 
base catalyst in the reaction between hydrogen peroxide and the enzyme. The regions around these two active site 
residues are more or less conserved in a majority of peroxidases [2,3]. The enzymes in which one or both of these 
regions can be found are listed betow. - Yeast cytochrome c peroxidase (EC 1.11.1.5) . - MyeloperoxkJase (EC 1 11 1 7) 
(MPO). MPO is found in granulocytes and monocytes and plays a major role in the oxygen-dependent mcrobicidal 
system of neutrophils. - Lactoperoxidase (EC 1.11.1.7) (LPO). LPO is a milk protein which acts as an antimicrobial 
agent. - Eosinophil peroxidase (EC 1.11.1.7) (EPO). An enzyme found in the cytoplasmic granules of eosinophils - 
Thyroid peroxidase (EC 1.11. 1.8) (TPO). TPO plays a central role in the biosynthesis of thyroid hormones. It catalyzes 
the lodinatiOTi and coupling of the homnonogenic tyrosines in thyroglobulin to yield the thyroid hormones T3 and T4 - 
Fungal ligninases. Ligninase catalyzes the first step in the degradation of lignin. It depolymerizes lignin by catalyzing 
the C(alpha)-C(beta) cleavage of the propyl side chains of lignin. - Plant peroxidases (EC 1.11.1.7 ). Plants expresses 
a large numbers of isozymes of peroxidases. Some of them play a role in cell^uberization by catalyzing the deposition 
of the aromatic residues of suberin on the cell wall, some are expressed as a defense response toward wounding 
others are involved in the metabolism of auxin and the biosynthesis of lignin. - Prokaryotic catalase-peroxidases Some' 
bacterial species produce enzymes that exhibit both catalase and broad-spectrum peroxidase activities [4] Examples 
of such enzymes are: catalase HP I from Escherichia coli (gene katG) and perA from Bacillus stearothermophilus 
[1235] Consensus pattem: (DETl-(LIVMTA]-x(2)-[LIVM]-[UVMSTAGJ-(SAG]-(LIVMSTAG ]-H- [STA]-[UVIWIFY1 fH is 
the proximal heme-binding ligand] - j i 1 1 

Consensus pattem: [SGATVl-x(3)-[UVMA]-R-[LIVIVIA]-x-(FW}-H-x-[SAC] [H is an active site residue]- 
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[ 1] Dawson J.H. Science 240:433-439(1988). 

( 2] Kimura S., Ikeda-Saito M. Proteins 3:113-120(1988). 

[ 3] Henrissat B., Salohelmo M.. Lavaitte S., Knowles J.K.C. Proteins 8:251-257(1990). 
[ 41 Welinder K.G. Biochim. Biophys. Acta 1080:215-220(1991). 

[1236] 455. pfkB family of carbohydrate kinases signatures 

It has been shown (1.2.3] that the following carbohydrate and purine kinasesare evolutionary related and can be 
grouped into a single family, which isknown [1 ] as the 'pfkB family": - Fmctokinase (EC 2 7.1.4 ) (gene scrK) - S-ohos- 
phofructokinase isozyme 2 (EC iJJJi) (phosphofructokinase-2) (gene pfkB). pfkB isl^or phosphofructokJnase 
isozyme in Escherichia coli and is not evolutionary related to the major isozyme (gene pfkA). Plants 6-phosphofruc- 
tokinase also belong to this family. - Ribokinase (EC 2.7.1.15 ) (gene rbsK). - Adenosine kinase (EC 2.7.1.20 ) (gene 
ADK). -2-dehydro-3^eoxygluconokinase (EC 2.7.1. 45 ) (gene: kdgK). - 1-phosphofructokinase (EC 2.7"T56W fructose 

(EC 2.7.1.144 ) (phosphotagatokinase) (gene lacC). - Escherichia coli hypothetical protein yeiC. - Escherichia coli hy- 
pothetical protein yeil. - Escherichia coli hypothetical protein yhfQ. - Escherichia coli hypothetical protein yihW - Bacillus 
sulJti^is hypothetical protein yxdC. - Yeast hypothetical protein YJR105wAII the above kinases are proteins of from 280 
to 430 ammo acid residues that share a few region of sequence similarity. Two of these regions were selected as 
signature patterns. The first pattern is based on a region rich in glycine which is located in the N-terminal section of 
these enzymes; while the second pattem is based on a conserved region in the C-terminal section 
[1237] Consensus pattern: [AG]-G-x(0. 1 )-[GAP]-x-N-x-[STA]-x(6)-[GS]-x(9)-G- 
Consensus pattem: IDNSK]-[PSTV]-x-[SAG](2)-[GD]-D-x(3)-[SAGV]-(AG]-[LIVMFYA]-[LIVMSTAP] 

[ 1] Wu L-R, Reizer A., Reizer J.. Cai B., Tomich J.M., SaierM.H. Jr J. Bacteriol. 173:3117-3127(1991) 
[ 2] Orchard L.M.D., Komberg H.L Proc. R. Soc. Lond.. B. Biol. Sci. 242:87-90(1990) 
[ 3] Blatch G.L, Scholle R.R.. Woods D.R. Gene 95:17-23(1990). 

[1238] 456. Phospholipase A2 active sites signatures 

Phospholipase A2 (EC 3iL4) (PA2) (1 .2] is an enzyme which releases fatty acids from the second carbon group of 
glycerol. PA2s are small and rigid proteins of 120 amino-acid residues that have four to seven disulfide bonds PA2 
binds a cateium ion which is required for activity. The side chains of two conserved residues, a histidine and an aspartic 
acid, participate in a 'catalytic network". Many PA2's have been sequenced from snakes, lizards, bees and mammals 
in the latter, there are at least four forms: pancreatic, membrane-associated as well as two less characterized forms' 
The venom of most snakes contains multiple forms of PA2. Some of them are presynaptic neurotoxins which inhibit 
neuromuscular transmission by blocking acetylcholine release from the nerve termini. Two different signature patterns 
were derrved for PA2's. The first is centered on the active site histidine and contains three cysteines involved in disulfide 
bonds. The second is centered on the active site aspartic ackJ and also contains three cysteines involved in disulfide 
ooncis. 

[1239] Consensus pattem: C-C-x(2)-H-x(2)-C [H is the active site residue] This pattem will not detect some snake 
toxins homologous with PA2 but which have lost their catalytic activity as well as otoconin-22. a Xenopus protein from 
the aragonitic otoconia which is also unlikely to be enzymatically active 

Consensus pattem: (LIVIVIA]-C-{LIVMFYWPCST}-C-D-x(5)-C [D is the active site reskJue] The majority of functional 
and non-functional PA2's. Undetected sequences are bee PA2. gila monster PA2's. PA2 PL-X from habu and PA2 PA- 
5 from mulga. 

[ 1] Davidson F.F.. Dennis E.A. J. Mol. Evol. 31:228-238(1990). 

( 2] Gomez R, Vandemieers A.. V&ndermeers-Piret M.-C.. Herzog R.. Rathe J.. Stievenart M.. Winand J Chris- 
tophe J. Eur. J. Biochem. 186:23-33(1989). 

[1240] 457. Phosphorylase pyridoxal-phosphate attachment site. Phosphorylases (EC 2.4.1.1 ) [1] are important al- 
lostenc enzymes in carbohydrate metabolism. They catalyze the formation of glucose l-'^h^hatefrom polyglucose 
such as glycogen, starch or maltodextrin. Enzymes from different sources differ in their regulatory mechanisms and 
their natural substrates. However, all known phosphorylases share catalytic and structural properties They are pyri- 
doxal-phosphate dependent enzymes; the pyridoxal-P group is attached to a lysine residue around which the sequence 
IS highly consen/ed and can be used as a signature pattern to detect this class of enzymes 
[1241] Consensus pattem: E-A-[SCJ-G-x-[GS]-x-M-K-x(2)-(LM]-N [K is the pyridoxal-P attachment site]- 
( 1] Fukui T. Shimomura S.. Nakano K. Mol. Cell. Biochem. 42:129-144(1 982). 
[1242] 458. Protein kinases signatures and profile 

Eukaryotic protein kinases [1 to 5] are enzymes that belong to a very extensive family of proteins which share a con- 
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nrs. regbn. which iVlocated in llTj"mtaTeL^^^^^^^ *° ''""^ P^««'"«- ^^^^ 

the vicinfty C a ^sine residua. ^-^ZZln ZZIo Z Sed tSp'Sf Jl^^''^' °' "^''"^^ ^ 
located In the central part of the catalvtic domain rnm^inc ! !. ^ ^® '^'ch is 

the catalytic activity of the elz^r^Zj^^^ZTl^^^^^ ^'"^ '^^''^"^ ^"^^ ^ ""P°rtant for 

threonine kinases L the otheTfor tlll^Ses I Se ITT h ^'^'""^ ^P^^""^ 

[1 1 and covers the entire catalytic domJr ' ''"""^P"' '^^^'^ °" ^'isn-^^nt in 

the class detected by this pattern buti^alt^S a nu^^^^^^ ""T*^ °' """^ '^"^^^^ ''«'°"9 ^ 

in this region and arl completely Sse?^yThna„er ' ''"^'^^ ^''^ ''""^ ^^^-^ent 

f:^L=s-rpr^^^^^^^^ 

ThlTprofilealso^eCrLet^^^^^^^^^^ 

tween these two familiesand the euteryXSnLr^^^^^ Sequence similarities be- 

[ 1] Hanks S.K., Hunter T. FASEB J. 9:576-596(1995). 

[ 2] Hunter T. Meth. Enzymol. 200:3-37(1 991 ). 

[ 3] Hanks S.K., Quinn A.M. Meth. Enzymol. 200:38-62(1991) 

[ 4] Hanks S.K. Curr. Opin. Struct. Biol. 1:369-383(1991) 

[ 5] Hanks S.K., Quinn A.M., Hunter T. Science 241 -42-52(1 988) 

loT4l5Sir ' '''' ■ ^^V-or S.S.. Sowads. J.M. Science 253: 

[ 7] Bairoch A., Claverie J.-M. Nature 331 :22(1 988) 

[ 8] Benner S. Nature 329:21 -21 (1 987). 

[ 9] Kirby R. J. Mol. Evol. 30:489-492(1992). 

[10] Littler E.. Stuart AD., Chee M.S. Nature 358:160-162(1992) 

[11] Munoz-Dorado J., Inouye S., Inouye M. Cell 67:995-1 nnfifiooi | 

[1245] Receptor tyrosine kinase class II signature 

products of a precursor molecule The ainha nh^ir. oonas. i ne alpha and beta chains are cleavage 

membrane and^ontainsre yr^inZrn^^^^^^^^^^^ S^^'^! ^ ''t' '^^"^^^-^^^ 

■ Insulin receptor from vertebrates Insu n nm^h? ? f ''""^"^^ '° ''«'o"9 ^ class 11 are- 

(IRR). Which I most p^babV a Lmo fo,^^^^^^^^^^ " receptor-related receptor 

Molluscan insulin-related SweTs) r^^^^^^^^^^ inlTT ' '^"y- " '"^^'^ ^^'^^Pto!^- " 

- The Drosophila developme'rJSp'LS^^^^^^^ puS^ 1' tort "T?": Branchiostoma lanceotetum. 
mationoftheR7photorecep.orcel.-Thet;?l'^:f'.c^^^^^^^^ 
receptors for nerve growth factor and related neurotrophic facto j (BDNf and N^^^ 
receptors: -ROS.-LTK(TYKI) -EDDR1 fcak TRKF f^T° t,^^^^ 

tyrosine kinase. While only the insuihn a^d Se iSL^S , ' '"^"^^ ^^'"^^ ^^^^P^^ 

conformatbn specie to ciass II Sk^s^^ H IZTnl T '° 
especial^, around the putative si Of alX^^^^^^ their kinase domain. 

Rm. Which incudes'the tyrosine restuTSp7oSa^^^^^^^^ ''-loped forth. c^ss o, 
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[1 248] Receptor tyrosine kinase class I) I signature 

[1249] Consensus pattern: G-x-H-x-N-[LIVM]-V-N-L-L-G-A-C-T- 

[ 1] Yarden Y.. Ullrich A. Annu. Rev. Biochem. 57:443-478(1988) 

[ 2] Hunkapiller T, Hood L. Adv. Immunol. 44:1-63(1989). 

[ 3] Lee K.-H., Bowen-Pope D.R. Reed R.R. Mol. Cell. BioL 10:2237-2246(1990) 

[1250] Receptor tyrosine kinase class V signatures 

Consensus pattern: C-x(2)-(DE]-G-[DEQJ-W-x(2,3)-rPAQl-rLIVIVrn-rGTl-x-C-x-C r tmpvi rcrM rxK .u ^. 
probably involved in disulfide bonds] «J Ili vivri j [t, i j x c x-C-x(2)-G-[HFY]-[EQ] [The three C's are 

[ 1] Yarden Y., Ullrich A. Annu. Rev. Biochem. 57:443-478(1988) 

[ 2] Sajjadi F.G., Pasquale E.B., Subramani S. New Biol 3769-778(1991 ) 

1 3] W«:ks I.R. Wilkinson D.. SaK^aris E.. Boyd A.W. Proc. Natl. Acad. Sci. U.S.A. 89:1611-1615(1992). 

[1 252] 459. Protein kinase C terminal domain 
[1 253] 460. Plant thionins signature 

-+IIIH 
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xxCCxxxxxxxxxxxCxxxxxxxxxCxxxCxxCxxxxxCxxxxxxxx \\U H'C: conserved cysteine involved in 

a disulfide bond.'*': position of the pattern. 

[1254] Consensus pattern: C-C-x(5)-R-x(2)-[FY]-x(2)-C [The three C's are involved in disulfide bonds] The proteins 
from the gamnia-thionin family are not related to the above proteins and are described in a separate section. 

[ 1] Vemon LP., Evett G.E., Zeikus R.D., Gray W.R. Arch. Biochem. Biophys. 238:18-29(1985). 

[ 2] Bohlmann H., Clausen S., Behnlte S.. Giese H.. Hiller C. Reimann-Phillip U.. Schrader G Barkholt V Aoel 

K. EMBO J. 7:1559-1565(1988). ' 

[ 3] Bohlmann H.. Apel K. Mol. Gen. Genet. 207:446-454(1987). 

[ 4] Teeter M.IVI.. Mazer J.A.. L'ltalien J.J. Biochemistry 20:5437-5443(1 981). 

[1255] 461. Polyprenyl synthetases signatures 

A variety of isoprenoid compounds are synthesized by various organisms. For example in eukaryotes the isoprenoid 
brosynthetic pathway is responsible for the synthesis of a variety of end products including cholesterol dolichol ubiq- 
uinone or coenzyme Q. In bacteria this pathway leads to the synthesis of isopentenyl tRNA. isoprenoid quinones and 
sugar earner lipids. Among the enzymes that participate In that pathway, are a number of polyprenyl synthetase en- 
zymes which catalyze a 1'4-condensation between 5 carbon isoprene units. Currently the sequence of some of these 
enzymes is known: - Eukaryotic famesyl pyrophosphate synthetase (FPP synthetase) (EC 2.5. 1.1 / EC 2.5.1.10) which 
catalyzes the sequential condensation of isopentenyl pyrophosphate (IPP) with dimethylallyl pyrophosphate (DMAPP) 
and then with the resultant geranyl pyrophosphate to form famesyl pyrophosphate. FPP synthetase is a cytoplasmic 
dimeric enzyme. - Prokaryotic famesyl pyrophosphate synthetase (gene ispA). - Prokaryotic octaprenyl diphosphate 
synthase (gene ispB). - Prokaryotic heptaprenyl diphosphate synthase (EC 2.5.1.30 V - Eukaryotic geranylgeranyl py- 
rophosphate synthetase (GGPP synthetase) (EC 2.5.1.1 / EC 2.5. 1.10 /EC 2.5. 1.29 ) which catalyzes the sequential 
addition of the three molecules of IPP onto DIVIAPP to form geranylgeranyl pyrophosphate. In plants GGPP synthase 
IS a chloroplast enzyme involved in the biosynthesis of terpenoids; in fungi, such as Neurospora crassa (gene al-3) 
this enzyme is involved in the biosynthesis of carotenoids. - Prokaryotic GGPP synthetase, which are involved in the 
biosynthesis of carotenoids (gene crtE). Such an enzyme is also encoded in the cyanelle genome of Cyanophora 
paradoxa. - Eukaryotic hexaprenyl pyrophosphate synthetase, which Is involved in the biosynthesis of coenzyme Q 
and which catalyzes the formation of all trans- polyprenyl pyrophosphates generally ranging in length of between 6 
and 10 isoprene units depending on the species. HP synthetase is a mitochondrial membrane^associated enzyme It 
has been shown [1 to 5] that all the above enzymes share some regions of sequence similarity. Two of these regions 
are rich in aspartic-acid residues and could be involved in the catalytic mechanism and/or the binding of the substrates 
signature patterns were developed for both regions. Possible additional members of this family of proteins are- - Bacillus 
subtihs spore germination protein C3 (gene gerC3). Both proteins are most probably also enzymes involved in isoore- 
noid metabolism [6]. 

[1256] Consensus pattern: [LIVM](2)-x-D-D-x(2,4)-D-x(4)-R-R-[GH]- 
Consensus pattern: [LIVMFYI-G-x(2)-IFYL]-Q-[LIVIVIl-x-D-D-[LIVMFY]-x-[DNG] 

[ 1] Ashby M.N., Edwards RA. J. Biol. Chem. 265:13157-13164(1990). 

[ 2) Fujisaki S., Hara H.. Nishimura Y, Horiuchi K.. NishinoT J. Biochem. 108:995-1000(1990). 

[ 3] Carattoli A., Romano N.. Ballario R. Morelli G., Macino G. J. Biol. Chem. 266:5854-5859(1991) 

[ 4] Kuntz M., Roemer S., Suire C. Hugueney R. Weil J.H.. Schantz R., Camara B. Plant J. 2:25-34(1 992). 

[ 5] Math S.K., Hearst J.E., Poulter CD. Proc. Natl. Acad. Sci. U.S./V. 89:6761-6764(1992). 

[ 6] Bairoch A. Unpublished observatbns (1 993). 

[1257] 462. Potato inhibitor I family signature 

The potato inhibitor I family is one of the numerous families of serine proteinase inhibitors. Members of this protein 
family are found in plants; in the seeds of barley or beans [1 .2,3], and in potato or tomato leaves where they accumulate 
in response to mechanical damage [4.5J. An inhibitor belonging to this family is also found in leech (6] It is interesting 
to note that, currently, this is the only proteinase inhibitor family to be found both inplant and animal kingdoms Struc- 
turally these inhibitors are small (60 to 90 residues) and in contrast with other families of protease inhibitors they lack 
disulfide bonds. They have a single inhibitory site. The consensus pattern includes three out of the four residues con- 
served in all members of this family and is located in the N-terminal half 

Consensus pattern: [FYW]-P-(EQH]-tLIV](2)-G-x(2)-[STAGV)-x(2)-A- Bariey subtilisinHJhymotrypsin inhibitor-2b has 
Glu instead of Gly. There is a trypsin inhibitor from the cucurbitaceae Momordica charantia [7], which Is said to belong 
to the potato inhibitor I family but which shows only a very weak similarity with the other members of this family. 

[ 1] Svendsen I., Hejgaard J.. Chavan J.K. Carlsberg Res. Commun. 49:493-502(1984). 
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[ 2] Svendsen I.. Boisen S.. Hejgaard J. Carlsberg Res. Commun. 47:45-53(1982). 

( 3] Nozawa H., Yamagata H.. Aizono Y, Yoshikawa M., Imsaki T. J. Biochem. 1(^:1003-1008(1989). 

[ 4] Cleveland I.E., Thomburg R.W., Ryan C.A. Plant Mol. Biol. 8:199-207(1987). 

( 5] Lee J.S.. Brown W.E.. Graham J.S., Pearce G., Fox E.A.. Dreher T.W.. Ahem K.G., Pearson G.D.. Ryan C A 
Proc. Natl. Acad. Sci. U.S.A. 83:7277-7281(1986). 

[ 6] Seemuller U.. Eulltz M., Fritz H., Strobl A. Hoppe-Seyler's Z. Physiol. Chem. 361:1841-1846(1980). 
[ 7] Zeng R-Y, Qian R.-Q., Wang Y FEBS Lett. 234:35-38(1988). 

[1258] 463 (pp binding) Phosphopantetheine attachment site 

Phosphopantetheine (or pantetheine 4'phosphate) is the prosthetic group of acyl carrier proteins (ACP) in some mul- 
tienzyme complexes where it serves as a 'swinging arm' tor the attachment of activated fatty acid and amino-acid 
groups [1]. Phosphopantetheine is attached to a serine residue in these proteins [2]. ACP proteins or domains have 
been found in various enzyme systems which are listed below (references are only provided for recently determined 
sequences). - Fatty acid synthetase (FAS), which catalyzes the formation of long^chain fatty acids from acetyl-CoA, 
malonyl-CoA and NADPH. Bacterial and plant chloroplast FAS are composed of eight separate subunits which corre- 
spond to the different enzymatic activities; ACP is one of these polypeptides. Fungal FAS consists of two multifunctional 
proteins, FAS1 and FAS2; the ACP domain is located in the N-terminal section of FAS2. Vertebrate FAS consists of a 
single multifunctional enzyme; the ACP domain is located between the beta-ketoacyl reductase ctomain and the C- 
terminal thioesterase domain [3]. - Polyketide antibiotics synthase enzyme systems. Polyketides are secondary me- 
tabolites produced from simple fatty acids, by microorganisms and plants. ACP is one of the polypeptidic components 
involved in the biosynthesis of Streptomyces polyketide antibiotics actinorhodin. curamycin. granatacin. monensin. 
oxytetracycline and tetracenomycin C. - Bacillus subtilis putative polyketide synthases pksK, pksL and pksM which 
respectively contain three, five and one ACP domains. - The multifunctional 6-methysalicylic acid synthase (MSAS) 
from Penicillium patulum. This is a multifunctional enzyme involved in the biosynthesis of a polyketide antibiotic and 
which contains an ACP donnain in the C-terminal extremity. - Multifunctbnal mycocerosic acid synthase (gene mas) 
from Mycobacterium bovis. - Gramicidin S synthetase I (gene grsA) from Bacillus brevis. This enzyme catalyzes the 
first step in the biosynthesis of the cyclic antibiotic gramicidin S, - Tyrocidine synthetase I (gene tycA) from Bacillus 
brevis. The reaction carried out by tycA is identical to that catalyzed by grsA - Gramicidin S synthetase II (gene grsB) 
from Bacillus brevis. This enzyme is a multifunctional protein that activates and polymerizes proline, valine, omithine 
and leucine. GrsB contains four ACP domains. - Erythronolide synthase proteins 1 . 2 and 3 from Saccharopolyspora 
erythraea which is involved in the biosynthesis of the polyketide antibbtic erythromicin. Each of these proteins contain 
two ACP domains. - Conidial green pigment synthase from Aspergillus nidulans, - ACV synthetase from various fungi. 
This enzyme catalyzes the first step in the biosynthesis of penicillin and cephalosporin. It contains three ACP domains. 
- Enterobactin synthetase component F (gene entF) from Escherichia coll. This enzyme is involved in the ATP-depend- 
ent activation of serine during enterobactin (enterochelin) biosynthesis. - Cyclic peptide antibiotic surfactin synthase 
subunits 1 . 2 and 3 from Bacillus subtilis. Subunits 1 and 2 contains three related domains while subunit 3 only contains 
a single domain. - HC-toxin synthetase (gene HTS1 ) from Cochliobolus carbonum. This enzyme synthesizes HC-toxin. 
a cyclic tetrapeptide. HTS1 contains four ACP domains. - Fungal mitochondrial ACP [9], which is part of the respiratory 
chain NADH dehydrogenase (complex I). - Rhizobium nodulation protein nodF. which probably acts as an ACP in the 
synthesis of the nodulation Nod factor fatty acyl chain.The sequence around the phosphopantetheine attachment site 
is conserved in all these proteins and can be used as a signature pattern. A profile was also developed that spans the 
complete ACP-like domain. 

[1259] Consensus pattem: [DEQGSTALMKRHHLIVMFYSTAC]-[GNQ].[LIVMFYAG].[DNEKHS]-S- [LIVMST]-{PC- 
FYHSTAGCPQLI VMF]-[LIVMATNHDENQGTAKRHLM1- [LIVMWSTA]-[LI VGSTACR]-x(2)-[LI VMFA) [S is the panteth- 
eine attachment site] 

[ 1] Concise Encyclopedia Biochemistry, Second Edition, Walter de Gruyter, Berlin New- York (1988). 
[ 2] Pugh E.L., Wakil S.J. J. Biol. Chem. 240:4727-4733(1 965). 

[ 3] Witkowski A., Rangan V.S., Randhawa Z.I., Amy CM.. Smith S. Eur J. Biochem. 198:571-579(1991). 

[ 6] Scotti C, Piatti M.. Cuzzoni A.. Perani P, Tognoni A., Grandi G., Galizzi A., Albertini A.M. Gene 130-65-71 

(1993). 

[ 9] Sackmann U., Zensen R., Rohlen D., Jahnke U.. Weiss H. Eur. J. Biochem. 200:463-469(1991). 
[1260] 464. (Prenyltrans) Terpene synthases signature 

The following enzymes catalyze mechanistically related reactions which involvethe highly complex cyclic rearrange- 
ment of squalene or its 2.3 oxide: - Lanosterol synthase (EC 5.4.99.7 ) (oxictosqualene-lanosterol cyclase), which 
catalyzes the cycllzation of (S)-2.3-epoxysqualene to lanosterol, the initial precursor of cholesterol, steroid hormc^es 
and vitamin D in vertebrates and of ergosterol in fungi (gene ERG7). - Cycloartenol synthase (EC 5.4.99.8 ) (2,3-epox- 
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ysqualene-^ycloartenol cyclase), a plant enzyme that catalyzes the cyclizatton of (S)-2.3- epoxysqualene to cycloar- 

tenoL - Hopene synthase (EC 5.4.99.-) (squalene--hopene cyclase), a bacterial enzyme that catalyzes the cycllzatlon 

of squalene Into hopene, a key step In hopanotd (trrterpenold) metabolism.These enzymes are evolutionary related [1) 

proteins of about 70 to 85 Kd. As a signature pattem. a highly conserved region was selected which is rich in arOTatic 

residues and which is located in the C-terminal sectbn. 

[1261] Consensus pattem: [DE]-G-S-W-x-G-x-W-[GA]-[LIVM]-x-[FY]-x-Y-[GA] 

[1262] [ 1] Corey E.J., Matsuda S.RT, Bartel B. Proc. Natl. Acad. Sci. U.S.A. 90:11628-11632(1993). 

[1263] 465. Prion protein signatures 

Prion protein (PrP) [1.2.3] is a small glycoprotein found In high quantity in the brains of humans or animals infected 
with a number of degenerative neurological diseases such as Kuru. Creutzfeldt^acob disease (CJD). scrapie or bovine 
spongifomri encephalopathy (BSE). PrP is encoded in the host genome and expressed both in normal and infected 
cells. It has a tendency to aggregate yielding polymers called rods. Structurally. PrP is a protein consisting of a signal 
peptide, followed by an N-terminal domain that contains tandem repeats of a short motif (PHGGGWGQin mammals. 
PHNPGY in chicken), itself followed by a highly consen/ed domain lly comes a C4erminal hydrophobic domain post- 
translationally removed when PrP is attachedto the extracellular side of the cell membrane by a GPI-anchor. The 

structureof PrP is shown in the following schematic representation: + — h ^.*«*** **** ^ 

+ISigl Tandem repeats 1 C C Sll +-+ + 1 + + ^ , Gp|.c.- consented 

cysteine rnvotved in a disulfide bond/**: position of the pattems. As signature pattem for PrP, a perfectly conserved 
alanine- and glycine-rich region of 16 residues was selected as well as a region centered on the second cysteine 
involved in the disulfide bond. 

[1264] Consensus pattem: A-G-A-A-A-A-G-A-V-V-G-G-L-G-G-Y- 

Consensus pattem: E.x.{ED]-x-K-[LIVM](2).x-[KR]-[LIVM](2).x-[QE]-M.C-x(2)- Q-Y [C is involved in a disulfide bond] 
[ 1] Stahl N., Prusiner S.B. FASEB J. 5:2799-2807(1991). 

[2] Brunori M., Chiara Silvestrini M., Pocchiari M. Trends Biochem. Sci. 13:309-313(1988). 
[ 3] Prusiner S.B. Annu. Rev. Microbiol. 43:345-374(1989). 

[1265] 466. Cyclophilin-type peptidyl-prolyl cis-trans isomerase signature and profile (pro isomerase) 
Cyclophilin [1] is the major high-affinity binding protein in vertebrates for the immunosuppressive drug cyclosporin A 
(CSA). It exhibits a peptldyl- prolyl cis-trans isomerase activity (EC 5.2.1.8 ) (PPIase or rotamase). PPIase is an enzyme 
that accelerates protein folding by catalyzing the cis-transisomerization of proline imidic peptide bonds in oligopeptides 
[2]. It is probable that CSA mediates some of its effects via an inhibitory action on PPIase. Cyclophilin is a cytosolic 
protein which belongs to a family [3.4.5]that also includes the following isozymes; - Cyclophilin B (or S-cyclophilin), a 
PPIase which is retained in an endoplasmic reticulum compartment. - Cyclophilin C, a cytoplasmic PPiase. - Mitochon- 
drial matrix cyclophilin (cyp3). - A PPIase which seems specific for the folding of rhodopsin and is an integral membrane 
protein anchored by a C-terminal transmembrane region. This protein was first characterized in Drosophila (gene 
ninaA). - Bacterial periplasmic PPiase (gene ppiA). - Bacterial cytosolic PPiase (gene ppiB). - NaturaHciller cell cyclo- 
philin- related protein. This large protein (about 1 60 Kd) is a component of a putative tumor-recognition complex involved 
in the function of NK cells. It contains a cyclophilin-type PPiase domain. - Mammalian nucleoporin Nup358 [6], a nuclear 
pore complex protein of 358 Kd that contains a C-terminal cyclophilin-type PPiase domain. - Yeast hypothetical protein 
YJR032W. - Fission yeast hypothetical protein SpAC21 El 1.05c. - Caenorhabditis elegans hypothetical protein 
T27D1 .1.The sequences of the different forms of cyclophilin-type PPIases are well conserved. As a signature pattern, 
a consen/ed region was selected in the central part of these enzymes. 

[1266] Consensus pattem: [FY]-x(2)-[STCNLVl-x-F-H-[RH]-[LIVMN].[LIVM].x(2)-F- [LIVM]-x-Q-[AG]-G- FKBP's, a 
family of proteins that bind the immunosuppressive drug FK506, are also PPIases, but their sequence is not at ail 
related to that of cyclophilin. 

[ 1] Stamnes M.A.. Rutherford S.L, Zuker C.S. Trends Cell Biol. 2:272-276(1992). 

[ 2] Fischer G.. Schmid FX Biochemistry 29:2205-2212(1990). 

[ 3] Trandinh C.C., Pao G.M., Saier M.H. Jr. FASEB J. 6:3410-3420(1992). 

[ 4] Galat A. Eur. J. Biochem. 216:689-707(1993). 

[ 5] Hacker J., Fischer G. Mol. Microbiol. 10:445456(1993). 

( 6] Wu J., Matunis M.J., Kraemer D.. Blobel G.. Coutavas E. J. Biol. Chem. 270: 14209-1 421 3f1 995V 
[1267] 467, Profilin signature 

Profilin [1.2] is a small eukaryotic protein that binds to monomeric actin(G-actin) in a 1:1 ratio thus preventing the 
polymerization of actin into filaments (F-actin). It can also, in certain circumstance promotes actin polymerization 
Profilin also binds to polyphosphoinositides such as PIP2,Overall sequence similarity among profilin from organisms 
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whfch belong to different phyla (ranging from fungi to nnammals) is low. but the N-terminal region is relatively well 
conserved. That region is thought to be involved inthe binding to actin. The signature pattern for profilin is based on 
consen/ed residues at the N-tenminal extremity .A protein structurally similar to profilin is present in the genome of 
variola and vaccinia viruses (gene A42R). 
s [1268] Consensus pattern: <x(0, 1 )-[STA]-x(0, 1 )-W-[DENQH]-x-[YI]-x-[DEQl 

[ 1] Haarer B.K., Brown S.S. Cell Motil. Cytoskeleton 17:71-74(1990). 
[ 2] Sohn RH., Goldschmidt-Clermont R BioEssays 16:465-472(1994). 

10 [1269] 468. Protamine PI signature 

Protamines are small, highly basic proteins, that substitute for histones in sperm chromatin during the haploid phase 
of spermatogenesis. They pack sperm DNA into a highly condensed, stable and inactive complex. There are two 
different types of mammalian protamine, called PI and P2. PI has been found in all species studied, while P2 is 
sometimes absent. There seems to be a single type of avian protamine whose sequence is closely related to that of 
mammalian P1 [1 ]. As a signature for this family of proteins, a conserved regbn was selected at the N-terminal extremity 
of the sequence. 

[1 270] Consensus pattern: [AV]-R-[NFY]-R-x(2. 3)-[ST]-x-S-x-S- 
[1271] [ 1] Oliva R.. Goren R.. Dixon G.H. J. BioL Chem. 264:17627-17630(1989). 
[1272] 469. Sperm histone P2 (protamine P2) 

This protein also known as protamine P2 can substitute for histones in the chromatin of sperm. The alignment contains 
both the sequence of the mature P2 protein and its propeptide. 
[1273] 470. Proteasome A-type subunits signature 

The proteasome (or macropain) (EC 3.4.99.46 ) [1 to 5,E1] is an eukaryotic and archaebacterial multicatalytic proteinase 
complex that seems to be involved inan ATP/ubiquitin-dependent nonlysosomal proteolytic pathway In eukaryotes the 
proteasome is composed of about 28 distinct subunits which fomi a highly ordered ring^haped structure (20S ring) of 
about 700 Kd. Most proteasome subunits can be classified, on the basis on sequence similarities into two groups, A 
and B. Subunits that belong to the A-type group are proteins of from 210 to 290 amino acids that share a number of 
conserved sequence regions. Subunits that are known to belong to this family are listed below - Vertebrate subunits 
C2 (nu), C3, C8, C9, iota and zeta. - Drosophila PROS-25, PROS-28.1 , PROS-29 and PROS-35 - Yeast CI (PRS1 ) 
C5 (PRS3). C7-alpha (Y8) (PRS2). Y7. Y1 3. PRE5, PRE6 and PUP2. - Arabidopsis thaliana subunits alpha and PSM3o' 
- Thermoplasma acidophilum alpha-subunit. In this archaebacteria the proteasome is composed of only two different 
subunits.As a signature pattem for proteasome A-type subunits the best consented region was selected, which is 
located in the N-terminal part of these proteins. 

[1274] Consensus pattem: [FY].x(4)-[STNV]-x-[FYW]-S-P-x-G-[RKH]-x(2)-Q-[LIVM]-[DE]- Y-[SAD]-x(2)-[SAG]-. 
35 These proteins belong to family T1 in the classification of peptidases [6,E2]. 

[ 1] Rivett A.J. Biochem. J. 291:1-10(1993). 
[ 2] Rivett A. J. Arch. Biochem. Biophys. 268:1-8(1989). 
[ 3] Goldberg A.L. Rock K.L Nature 357:375-379(1992). 
40 [ 41 Wilk S. Enzyme Protein 47:187-188(1993). 

[ 5] Hilt W., Wolf D.H. Trends Biochem. Sci. 21:96-102(1996). 
[ 6] Rawlings N.D.. Barrett A J. Meth. Enzymol. 244:19-61(1994). 

[1275] Proteasome B-type subunits signature 

The proteasome (or macropain) (EC 3.4.99.46) [1 to 5,E1 ] is an eukaryotic and archaebacterial multicatalytic proteinase 
complex that seems to be involved in an ATP/ubiquitin-dependent nonlysosomal proteolytic pathway In eukaryotes 
the proteasome is composed of about 28 distinct subunits which form a highly ordered ring-shaped structure (20S ring) 
of about 700 Kd. Most proteasome subunits can be classified, on the basis on sequence similarities into two groups. 
A and B. Subunits that belong to the B-type group are proteins of from 1 90 to 290 amino acids that share a number of 
conserved sequence regions. Subunits that are known to belong to this family are listed below - Vertebrate subunits 
C5. beta, delta, epsiton. theta (C10-II). LMP2/RING12. CI 3 (LMP7/RING10), C7-I and MECL-1 - Yeast PRE1 PRE2 
(PRG1), PRE3. PRE4, PRS3, PUP1 and PUP3. - Drosophila L(3)73Ai. - Fission yeast ptsl. - Thermoplasma' acido- 
philum beta-subunit. In this archaebacteria the proteasome is composed of only two different subunits. As a signature 
pattem for proteasome B-type subunits the best consented region was selected, which is located in the N-terminal part 
55 of these proteins. 

[1270] Consensus pattem: [LIVMA]-[GSA]-{LIVMF]-x.[FYLVGAC].x(2)-[GSACFY]-[LIVMSTACl(3)-[GAC]- 
[GSTACV]-[DES].x(15HRK]-x(12,13).G-x(2)-[GSTA]-D-. These proteins belong to family T1 in the classification of 
peptidases [6,E2]. 
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[ 1] Rivett A.J. Biochem. J. 291:1-10(1993). 

[ 2] Riven A. J. Arch. Biochem. Biophys. 268:1-8(1989). 

[ 3] Goldberg A.L., Rock K.L Nature 357:375-379(1992). 

[ 4] Wi!k S. Enzyme Protein 47:187-188(1993). 

[ 5] Hilt W.. Wolf D.H. Trends Bk>chem. Sci. 21:98-102(1996). 

[ 6] Rawlings N.D.. Barrett A. J. Meth. Enzymol. 244:19^1(1994). 



[1277] 471 . (pyr redox) Pyridine nucleotide<Jlsulphide oxidoreductases class-l active site 
The pyridine nucleotide-disulphide oxidoreductases are FAD flavoproteins which contains a pair of redox^active 
cysteines involved in the transfer of reducing equivalents from the FAD cofactor to the substrate On the basis of 
sequence and structural similarities [1 ] these enzymes can be classified into two categories. The first category groups 
together the following enzymes [2 to 6]: - Glutathione reductase (EC 1.6.4.2 ) (GR). - Higher eukaryotes thioredoxin 
reductase (EC 16A5). - Trypanothione reductase (EC 1.6.4.8 ). - Lipoamlde dehydrogenase (EC 1 8 1 4 ) the E3 
component of alpha-ketoacid dehydrogenase complexes. - Mercuric reductase (EC 1.16.1.1 ).The sequence around 
the two cysteines involved in the redox-active disulfide bond is conserved and can be used as a signature pattern 
[12781 Consensus pattern: G-G-x-C-[LIVA]-x(2)-G-C-[LIVM]-P [The two C's form the active site disulfide bond] In 
positions 6 and 7 of the pattem all known sequences have Asn.( Val/ lie) with the exception of GR from plant chloroplasts 
and from cyanobacteria which have lle-Arg [7]. 

[ 1] Kurlyan J.. Krishna TS.R., Wong L.. Guenther B„ Pahler A. Williams C.H. Jr., Model R Nature 352:172-174 
(1 991 ). 

[ 2] Rice D.W.. Schuiz G.E., Guest J.R. J. Mol. Biol. 174:483-496(1984). 
[ 3] Brown N.L. Trends Biochem. Sci. 10:400-402(1985). 

[ 4] Carothers D.J.. Pons G., Patel M.S. Arch. Biochem. Biophys. 268:409-425(1989). 

[ 5] Walsh C.T, Bradley M., Nadeau K. Trends Biochem. Sci. 16:305-309(1991). 

[ 6] Gasdaska RY. Gasdaska J.R.. Cochran S., Powis G. FEBS Lett. 373:5-9(1995), 

[ 7] Creissen G.. Edwards E.A, Enard C, Wellburn A., Mullineaux R Plant J. 2:129-131(1991). 

[1279] 472. (pyridoxal deC) DDC / GAD / HDC / TyrDC pyridoxal-phosphate attachment site (pyridoxal deC) 
Three different enzymes - all pyridoxa dependent decarboxylases - seem to share regions of sequence similarity 
[1 .2,3,4], especially in the vicinity of the lysine residue which serves as the attachment site for the pyridoxal-phosphate 
(PLP) group. These enzymes are: - Glutamate decarboxylase fEC 4.1.l.l5 ^ (GAD). Catalyzes the decarboxylation of 
glutamate into the neurotransmitter GABA (4-aminobutanoate). - Histidine decarboxylase (EC 4.1.1.22 ) (HDC) Cata- 
lyzes the decarboxylation of histidine to histanriine. There are two completely unrelated types of HDC: those that use 
PLP as a cofactor (found in Gram-negative bacteria and mammals), and those that contain a covalently bound pyruvoyi 
residue (found in Gram-positive bacteria). - Aromatic-L-amino-acid decarboxylase (EC 4.1.1.28 ) (DDC), also known 
as L-dopa decarboxylase or tryptophan decarboxylase. DDC catalyzes the decarboxylation of tryptophan to tryptamine 
It also acts on 5-hydroxy-tryptophan and dihydroxyphenylalanine (L-dopa). - Tyrosine decarboxylase (EC 4 1 1 25 ) 
(TyrDC) which converts tyrosine into tyramine. a precursor of isoquinoline alkaloids and various amides.These enzymes 
are collectively known as group II decarboxylases [3,4]. 

[1280] Consensus pattem: S-[LIVMFYW]-x(5)-K-[LIVMFYWG](2)-x(3)-[LIVMFYW]-x-[CA].x(2)-[LIVMFYWQ]-xf2)- 
[RK] [K is the pyridoxal-P attachment site] 

[ 1] Jackson F.R. J. Mol. Evol. 31 :325-329(1 990). 

[ 2] Joseph D.R. Sullivan P, Wang Y-M.. Kozak C, Fenstermacher D. A, Behrendsen M E . Zahnow C A Proc 
Natl, Acad. Sci. U.S.A. 87:733-737(1990). 

[ 3] Sandmeier E., Hale Tl., Christen P Eur. J. Biochem. 221:997-1002(1994). 

[ 4] Ishii S., Mizugichi H., Nishino J., Hayashi H., Kagamiyama H. J. Biochem. 120:369-376(1996). 

[1281] 473. Regulator of chronfK>some condensation (RCC1 ) signatures (RCC1 ) 

The regulator of chromosome condensatbn (RCCI ) [1 ] is a eukaryotic protein which binds to chromatin and interacts 
with ran, a nuclear GTP-binding protein, to promote the loss of bound GDP and the uptake offresh GTR thus acting 
as a guanine-nucleotide dissociation stimulator (GDS)[2]. The interaction of RCCI with ran probably plays an important 
role in the regulation of gene expression. RCCI . known as PRP20 or SRM1 in yeast, pimi in fission yeast and BJ1 in 
Drosophila, is a protein that contains seven tandem repeats of a domain of about 50 to 60 amino acids As shown in 
the following schematic representation, the repeats make up the major part of the length of the protein Outside the 
repeat region, there is just a small N-terminal domain of about 40 to 50 residues and. in the Drosophila protein onlv a 
C-terminal domain of about 130 residues, h h + ^ h ^ ^ ^ ^ ,Pp^ 
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1 IRpt. 2 IRpt. 3 IRpt. 4 IRpt. 5 IRpt 6 IRpt. 7 I C-temninal I +—H -n h h ^ + + 

+ h In Drosophila two signature patterns for RCC1 were developed. The first is found in the N-terminal part 

of the secx>nd repeat; this is the most conserved part of RCC1 . The second is derived from conserved positions in the 

C-terminal part of each repeat and detects up to five copies of the repeated dcwnain. The RCC1 -type of repeat is also 

found in the X-linked retinitis pigmentosa GTPase regulator [3]. 

[1282] Consensus pattem: G-x-N-D-x(2HAV]-L-G-R-x-T- 

Consensus pattern: [LIVMFA]-[STAGC](2)-G-x(2)-H-[STAGLIHLIVMFAl-x-[LIVM]- 

( 1] Dasso M. Trends Biochem. Sci. 18:96-101(1993). 

[ 2] Boguski M.S., McCormick F. Nature 366:643-654(1993). 

[ 3] Ftoepman R., Van Duijnhoven G,. Rosenberg T, Pinckers A.J.LG., Bleeker-Wagemakers L.M.. Bergen A.A, 
B.. Post J.. Beck A.. Relnhardt R.. Ropers H.-H.. Cremers F. Berger W. Hum. Mol. Genet. 5:1035-1041(1996). 

[1283] 474. RNA 3'-temDinal phosphate cyclase signature (RCT) 

RNA 3'-terminal phosphate cyclase (EC 6.5.1.4 ) [1 .2] catalyzes the conversion of 3'-phosphate to a 2*.3'-cyclic phos- 
phodiester at the end of RNA. The biological role of this enzyme is unknown but it is likely to function in some aspects 
of cellular RNA processing. The reaction catalyzed by the enzyme occurs in three steps: 1 ) adenylatbn of the enzyme 
by ATP; 2) the enzyme acts on RNA-3'terminal phosphate to produce RNA-3'terminal diphosphate adenylate; 3) Re- 
lease of AMP and cyclisation by a non catalytic nucleophilic attack by the adjacent 2'hydroxyl on the phosphorus in 
the diester linkage. This enzyme, which has been characterized in human (where there seems to be at least three 
isozymes) and Escherichia coli (gene rtCA), seems to be taxonomically widespread. It is found in insects, plants, fungi 
(gene RTC1 inyeast) and in archeabacteria. RNA cyclase is a protein of from 36 to 42 Kd. The best conserved region, 
which is used as a signature pattern, is a glycine-rich stretch of residues located in the central part of the sequence 
and which is reminiscent of various ATP. GTPor AMP glycine-rich loops. In this context, the conserved Arg (His in the 
E.coli enzyme) could be the AMP-binding residue. 
[1284] Consensus pattem: [RH]-G-x(2)-P-x-G(3)-x-[LIV]- 

[ 1] Genschik P, Billy E., Swianiewicz M.. Filipowicz W. EMBO J. 16:2955-2967(1997). 
[ 2] Filipowicz W., Vincente O. Meth. Enzymol. 181:499-510(1990). 

[1285] 475. REV protein (anti -repress ion trans-activator protein) 

[1286] 476. Prokaryotic-type class I peptide chain release factors signature (RF-1 ) 

Peptide chain release factors (RFs) are required for the temnination of protein biosynthesis [1]. At present two classes 
of RFs can be distinguished. Class I RFs bind to ribosomes that have encountered a stop codon at their decoding site 
and induce release of the nascent polypeptide. Class II RFs are GTP-binding proteins that interact with class I RFs 
and enhance class I RF activity. In prokaryotes there are two class I RFs that act in a codon specific manner(2]: RF-1 
(gene prf A) mediates U AA and U AG-dependent termination while RF-2(gene prf B) mediates UAA and UG A-dependent 
termination. RF-1 and RF-2 are structurally and evolutionary related proteins which have been shown [3] to make up 
a family that also contains the following proteins: - Fungal MRF1. a mitochondrial RF (m-RF) which recognizes the 
UAA and UAG codons. - Escherichia coli RF-H, a protein of unknown function. - Escherichia coli hypothetical protein 
yaeJ and a close Pseudomonas putida homolog. A highly conserved region located in the central part of the 40 to 45 
Kd RF-1/2 and m-RF and in the N-temriinal of the 15 to 16Kd RF-H and yaeJ is used as a signature pattern. 
[1287] Consensus pattem: [AR]-[STA]-x-G-x-G-G-Q-[HNGCSl-V-N-x(3)-[ST]-A-[IV] 

Note that prokaryotic-type class I RFs display no significant sequence similarity to prokaryotic4ype class II which belong 
to the family of GTP-binding elongation factors nor to eukaryotic class I or class II RFs. 

[ 1] Tate W.P, Poole E.S., Mannering S.M. Prog. Nucleic Acids. Res. Mol. Biol. 52:293-335(1996). 
[ 2] Craigen W.J., Lee C.C.. Caskey C.T Mol. Microbiol. 4:861-865(1990). 
[ 3] Pel H.J., Rep M.. Grivell LA. Nucleic Acids Res. 20:4423-4428(1992). 

[1288] 477. RIO1/ZK632.3/MJ0444 family signature 

The following uncharacterized proteins are evolutionary related [1]: - Yeast protein RI01. - Caenorhabditis elegans 
hypothetical protein ZK632.3. - Methanococcus jannaschii hypothetical protein MJ0444. - Thermoplasma acidophilum 
hypothetical protein if rpoA2 3*region.The eukaryotic members of this family are proteins of about 55 to 60 Kd. v^ile 
the archebacterial ones are half that size. The central part of these proteins is highly conserved. The best cx>nserved 
region is used as a signature pattem. 

[1289] Consensus pattem: [LIVM]-V-H-[GA]-D-L-S-E-[FY)-N-x-[LIVM] 
[1 290] [ 1 ] Bairoch A. Unpublished observatbns (1 997). 



188 



EP1 033 405 A2 



[1291] 478. (RIP) Shiga/ricin ribosomal inactivating toxins active site signature. A number of bacterial and plant toxins 
act by inhibiting protein synthesis in eukaryotic cells. The toxins of the Shiga and ricin family inactivate 60S ribosomal 
subunits by an N-gtycosidic cleavage whk:h releases a specific adenine base from the sugar-phosphate backbone of 
28S rRN A [1 ,2,3]. The toxins which are known to function in this manner are: - Shiga toxin from Shigella dysenteriae 

5 [4]. This toxin is composed of one copy of an enzymatk^lty active A subunit and five copies of a B subunit responsible 
for binding the toxin complex to specific receptors on the target cell surface. - Shiga^ike toxins (SLT) are a group of 
Escherichia coli toxins very similar in their structure and properties to Shiga toxin. The sequence of two types of these 
toxins, SLT-1 [5] and SLT-2 [6], is known. - Ricin, a potent toxin from castor bean seeds. Ricin consists of two glyco- 
sylated chains linked by a disulfide bond. The A chain is enzymatically active. The 8 chain is a lectin with a binding 

10 preference for galactosides. Both chains are encoded by a single polypeptidic precursor. Ricin is classified as a type- 
II ribosome-inactivating protein (RIP); other members of this family are agglutinin, also from castor bean, and abrin 
from the seeds of the bean Abrus precatorius [7]. - Single chain ribosome-inactivating proteins (type-l RIP) from plants. 
Examples of such proteins are: barley protein synthesis inhibitors I and II, mongolian snake-gourd trichosanthin. sponge 
gourd luffin-A and -B, garden four-o'clock MAP, common pokeberry PAP-S and soapwort saporin-6 (7). All these toxins 

IS are structurally related, A consen/ed glutamic residue has been implicated [8] in the catalytic mechanism; it is located 
near a consen/ed arginine which also plays a role in catalysis [9]. The signature that has been developed for these 
proteins includes these catalytic residues. 

[1292] Consensus pattern: [LIVMAl-x-[LIVMSTA](2)-x-E-[SAGV]-[STAL]-R-[FYHRKNQS]-x- [LlVM]-[EQS]-x(2)- 
[LIVMF] [E and R are active site residues]- 

20 [1293] [ 1] Endo Y, Tsurugi K.. Takeda Y, Ogasawara T. Igarashi K. Eur. J. Biochem. 171:45-50(1988).! 2] May M. 
J., Hartley M.R.. Roberts LM., Krieg PA., Osborn R.W., Lord J.M. EMBO J. 8:301 -308(1 989).[ 3] Funatsu G., Islam 
M.R.. Minami Y, Sung-Sil K., Kimura M. Biochimie 73: 11 57-1 161(1 991 ).[ 4] Strockbine N.A., Jackson M.P, Sung L 
M., Holmes R.K.. O'Brien AD. J. Bacteriol. 170:1 116-11 22( 1988). [ 5] Calderwood S.B., Auclair P., Donohue-Rolfe A., 
Keusch G.T. Mekalanos J.J. Proc. Natl. Acad. Sci. U.S.A. 84:4364-4368(1 987). [6] Jackson M.P, Neill R.J., O'Brien 

25 A.D., Holmes R.K., Newland J.W. FEMS Microbiol. Lett. 44: 109-1 14(1 987). [ 7] Barbieri L., Battelli M.G., Stirpe F. Bk>- 
chim. Biophys. Acta 1 154:237-282(1 993). [ 8] Hovde C.J.. Calderwood S.B.. Mekalanos J.J., Collier R.J. Proc. Natl. 
Acad. Sci. U.S.A. 85:2568-2572(1 988).[ 9] Monzingo A.R, Collins E.J.. Ernst S.R., Irvin J.D.. Robertus J.D. J. Mol. 
Biol. 233:705-715(1993). 

[1294] 479. Bacterial RNA polymerase, alpha chain (RNA pol A bac) 

30 Members of this family include alpha subunit from eubacteria and alpha subunits from chloroplasts. The alpha subunit 
of RNA polymerase consists of two independently folded domains, referred to as amino-terminal and carboxyl terminal 
domains. The amino terminal domain is involved in the interaction with the other subunits of the RNA polymerase. The 
carboxyl-terminal domain interacts with the DNA and activators. The amino acid sequence of the alpha subunit is 
conserved in prokaryotic and chloroplast RNA polymerases. There are three regions of particularly strong conservation, 

35 two in the amino-terminal and one in the carboxyl-Comment: terminal [3]. 

[1] Zhang G, Darst SA; Science 1 998;281 :262-266. [2] Jeon YH, Negishi T, Shirakawa M, YamazakiT. Fujita N, Ishihanna 
A. Kyogoku Y; Science 1995;270:1495-1497. [3] Ebright RH. Busby S; Curr Opin Genet Dev 1995;5: 197-203, [4] Mu- 
rakami K, Kimura M. Owens JT Meares CF, Ishihama A; Proc Natl Acad Sci USA 1997;94:1709-1714. 
[1295] 480. RNA polymerase beta subunit (RNA pol B) 

40 RNA polymerases catalyse the DNA dependent polymerisation of RNA. Prokaryotes contain a single RNA polymerase 
compared to three in eukaryotes (not including mitochondrial and chloroplast polymerases). Each RNA polymerase 
complex contains two related members of this family, in each case they are the two largest subunits. [1] Falkenburg 
D, Dworniczak B, Faust DM. Bautz EK; J Mol Biol 1987;195:929-937. 
[1296] 481 . RNA polymerases H / 23 Kd subunits signature 

45 In eukaryotes, there are three different forms of DNA-dependent RNA polymerases (EC 2.7.7.6 ) transcribing different 
sets of genes. Each class of RNA polymerase is an assemblage of ten to twelve different polypeptides. In archaebac- 
teria, there is generally a single form of RNA polymerase whfch also consist of an oligc^neric assemblage of 10 to 13 
polypeptides. Archaebacterial subunit H (gene rpoH) [1,2] is a small protein of about 8.5 tolO Kd, it is evolutionary 
related to the C-terminal part of a 23 Kd component shared by all three forms of eukaryotic RNA polymerases (gene 

50 RPB5 in yeast and POLR2E in mammals).As a signature pattern a conserved regbn vfas selected which is located at 
theN-terminal extremity of subunit H; this region contains two histidines that could play a role in the binding of a metal ion. 
[1297] Consensus pattern: H-[NEI]-[UVM]-V-P-x-H-x(2)-[LIVM]-x{2)-[DEJ 

[ 1] Klenk H.-P., Palm P., Lottspeich F„ Zillig W. Proc. Natl. Acad. Sci. U.S.A. 89:407-410(1992). 
55 [ 2]Thiru A., Hodach M., Eloranta J.J., Kostourou V, Weinzierl R.O.. Matthews S.; J. Mol. Biol. 287:753-760(1 999). 

[1298] 482. RNA polymerases K/ 14to 18 Kd subunits signature 

In eukaryotes, there are three different forms of DNA-dependent RNApolymerases (EC 2.7.7.6 ) transcribing different 
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sets of genes. Each class of RNA polymerase is an assemblage of ten to twelve different polypeptides In archaebac- 
teria, there is generally a single form of RNA polymerase which also consist of an oligomeric assemblage of 10 to 13 
polypeptides. A component of 14 to 18 Kd shared by all three forms of eukaryotic RNA polymerases and which has 
been sequenced in budding yeast (gene RPB6 orRP026). in fission yeast (gene rpb6 or rpol 5), in human and in African 
swine fever virus [1] is evolutionary related [2] to archaebacterial subunit K (gene rpoK). The archaebacterial protein 
is colinear with the C-terminal part of the eukaryotic subunit. 
[1299] Consensus pattern: [ST]-x-[FY]-E-x-[AT]-R-x-[LIVIWI]-{GSAl-x-R-ISA]-x-Q 

[ 1] Lu Z., Kutish G.F.. Sussman M.D., Rock D.L Nucleic Acids Res. 21:2940-2940(1993). 
[ 2] McKune K., Woychik N.A. J. Bacteriol. 176:4754-4756(1 994). 

[1 3001 483. RNA polymerases L / 1 3 to 1 6 Kd subunits signature 

In eukaryotes. there are three different forms of DNA-dependent RNApolymerases (EC 2.7.7.6^ transcribing different 
sets of genes. Each class of RNA polymerase is an assemblage of ten to twelve different polypeptides In archaebac- 
teria. there is generally a single form of RNA polymerase which also consist of an oligomeric assemblage of 10 to 13 
polypeptides. It has been shown that small subunits of about 1 3 to 1 6 Kd found in all three types of eukaryotic polymer- 
ases are highly conserved. Subunits known to betong to this family are: - Budding yeast RPC19 subunit from RNA 
polymerases I and III [1]. - Budding yeast RPB11 subunit from RNA polymerase II [2]. - Mammalian RPB11 (gene 
POLR2K) from RNA polymerase II.- Caenorhabditis elegans hypothetical protein F58A4.9. - Methanococcus jannaschii 
RNA polymerase subunit L (gene rpoL). - Sulfolobus acidocaldarius RNA polymerase subunit L (gene rpoL) [3] As a 
signature pattern a consen/ed region was selected which is located at the N-terminal extremity of these polymerase 
subunits; this region contains two cysteines that could play a role in the binding of a metal ion 
[1301] Consensus patlem: [DE](2)-H-[ST]-[LIVM]-[GAP]-N-x(11)-V-x-[FM]-x(2)-Y-x(3)- H-P 

1 1] Dequard-Chablat M., Riva M., Carles C, Sentenac A. J. Biol. Chem. 266:15300-15307(1991) 
[ 2] Woychik N.A. McKune K., Lane W.S., Young R.A Gene Expr. 3:77-82(1993) 
[ 3] Langer D. EMBLyGenBank: X70805. 

[1302] 484. RNA polymerases N / 8 Kd subunits signature 

In eukaryotes. there are three different forms of DNA-dependent RNA polymerases (EC 2.7.7.6 ) transcribing different 
sets of genes. Each class of RNA polymerase is an assemblage of ten to twelve different polypeptides In archaebac- 
teria. there is generally a single form of RNA polymerase which also consist of an oligomeric assemblage of 10 to 13 
polypeptides. Archaebacterial subunit N (gene rpoN) [1] is a small protein of about 8 Kd. it is evolutionary related [21 
to a 8.3 Kd component shared by all three fomis of eukaryotic RNA polymerases (gene RPB10 in yeast and POLR2J 
in mammals) as well as to African swine fever virus protein CP80R [3].As a signature pattern a conserved region was 
selected which is located at the N-terminal extremity of these polymerase subunits; this regbn contains two cysteines 
that could play a role in the binding of a metal ion. 
[1303] Consensus pattern: [LIVMF](2)-P-[LIVM]-x-C-F-[ST]-C-G- 

[ 1] Langer D., Hain J., Thuriaux P. Zillig W. Proc. Natl. Acad. Sci. U.S.A. 92:5768-5772(1995) 
[ 2] McKune K.. Woychik N.A. J. Bacteriol. 176:4754-4756(1994). 

(1995^^^ RcxJriguez J.M.. Nogal M.L. Yuste L. EnriquezC, Rodriguez J. F., Vinuela E. Virology 208:249-278 

[1304] 485. Ribonuclease HII 

[1] Mian IS; Nucleic Acids Res 1997;25:3187-3189. 

[1305] 486. Ribonuclease PH signature 

Prokaryotic ribonuclease PH (EC 2.7.7.56) (RNase PH) [1] is a phosphorolyticexoribonuclease that removes nucleotide 
residues following the -CCA tenminus of tRNA and adds nucleotides to the ends of RNA molecules by using nucleoside 
diphosphates as substrates. RNase PH is a conserved protein of about 240 amino^cid residues It is evolutionary 
related to Caenorhabditis elegans hypothetical protein B0564.1.As a signature pattern, the most highly conserved 
region was selected which is located in the central part of these proteins 

Consensus sequence: C-(DE]-(UVM](2)-Q-[GTA]-D-G-(SG]-x(2)-[TA]-A [ 1] Kelly K.O.. Deutscher M.P J. Biol Chem 

267:17153-17158(1992). 

[1306] 487. RanBPI domain 

[1 ] Di Matteo G. Fuschi P. Zerfass K. Moretti S. Ricordy R. Cenciarelli C. Tripodi M. Jansen-Durr P Lavia P- Cell Growth 

Differ 1995;6:1213-1224. 

[1307] 488. Rhodanese signatures 



190 



EP 1 033 405 A2 

Rhodanese (thiosulfate sulfurtransferase) (EC 2JiD [1.2] is an enzyme which catalyzes the transfer of the sulfane 

eMh«r ^ T"^- 1^^'' ^''^ " ^'"^'^"'l" rtidA. - Escher^hia coli sseA [3]. - Saccharopo^spora 

erythraea (^sA [4]. - Synechococcus strain PCC 7942 rhdA [5]. RhdA is a peripbsmic protein probably Involved in the 

lon^ J ' " '"""^r?- ^""'"^ -""^"^^^ '^"^ ^-«'°P' They are based on h^Ny 

nToT ? ' ■ 'rc^ ^^9'°"' C-terminal e)Jremity of the enzj^e 

[1308] Consensus pattern: [FY]-x(3)-H-[LIV]-P-G-A-x(2HLIVF] een/iyme. 
Consensus pattern: [FY]-[DEAF]-G-[SA]-W-x-E-[FYW] 

[ 1] Westley J. Meth. Enzymol. 77:285-291(1981). 

[ 2] Weiland K.L, Dooley TP. Biochem. J. 275:227-231(1991). 

[ 3] Rudd K.E. Unpublished observations (1993). 

[ 4] Donadio S., Shafiee A., Hutchinson C.R. J. Bacterlol. 172:350-360(1990) 

[5] Laudenbach D.E.. Ehrhardt D., Green L.. Grossman A.R. J. Bacterlol. 173:2751-2760(1991). 

[1309] 489. Ribonuclease III family signature 

Prolaryotic ribonuclease III (EC 3J^) (gene mc) [1 ] is an enzyme that digests double-stranded RNA. It is involved 

?rui Sf" \T ^ '"^""-^'^^^^ 'h^' inhibits mating and meiosis by degrading a specific mRNA 

required for sexual development. - Yeast ribonuclease III (gene RNT1 ). a dsRNA-specific nuclease that cleavTs eu 
^711 P[«"bosomal RNA at various sites. - Caenorhabditis elegans hypothetical protein F26E4.13. - Paramecium 
bursana chlorella virus 1 protein A464R. - Synechocystis strain PCC 6803 hypothetical protein 8lr0346 - Fission yea^ 

Caenorhabdrtis elegans hypothetical protein K1 2H4.8. a protein with the same structure as SpAC8A4.08c These pro- 

r ,:xrs"nir^^^^^^^^^ °' ^'^^^ ^ ^^^^^ ^^-^^-'^ -^^^^ °' « -^^^^ 

[1310] Consensus pattern: [DEQ]-[RQ]-[LM]-E-[FYW]-[LV]-G-D-[SAR]- 

[ 1] Nashimoto H.. Uchida H. Mol. Gen. Genet. 201:25-29(1985). 
[ 2] Mian I.S. Nucleic Acids Res. 25:3187-3195(1997). 

[1311] 490. Rieske iron-sulfur protein signatures 

Ubiquinol-cytochrome c reductase (EC 1.10.2.2) (also known as the bcl complexor complex III) is one of the electron 

cSom'e r in 1 f °' ^"^"'^ ^"^^"'^^^ " ^'^^'^^ oxiSor Jction of ubiquS anS 

cytochrome c. In the chloroplast of plants and in cyanobacteria plastoquinone-plastocyanin reductase (EC 1 10 99 1^ 

H r n T^^"'' °' '^'''"'^ P^°tein with a 2?e-2S clusteV X^^s 

called the Rieske protein [1.2]. The Rieske protein contains approximately 190 amino acid residues The on^urtur 
oroS;^'^'7 'IIk protein through cysteine and histidine residues. Two perfectly conserved regions in pTele 
proteins con ains all ttie residuesthat bind the iron-sutfur cluster. Both regbns contain two cysteines and a histidine 

lonVrj "'"-'^ "S^"'^^ '"^^ ^^'"^'"'"9 '^^-^ « disulfide bond [S] Two 

consen/ed regions were selected as signature patterns 

Sd In^sZeCdT^ C-[TK]-H-L-G-C-[LIVST] [The first C and the H are 2Fe-2S ligands] [The second C is 
Consensuspattem:C-P-C-H-x-[GSA][ThefirstCandtheHare2Fe-2Sligands][Th^ 

[ 1] GattI FL.. Meinhardt S.W., Ohnishi T. TzagolofI A J. Mol. Biol. 205:421-435(1989) 
[ 2] Kallas T. Spiller S.. Malkin R. Proc. Natl. Acad. Sci. U.S.A. 85:5794-5798(1988) 
[ 3] Iwata S., Saynovits M., Link TA., Michel H. Structure 4:567-579(1996). 

[1313] 491. Ribosomal protein LI signature 

the'^rRNTT.t'.ll"'.'.^ '"^^ "^own to bind to 

Fulf.ri»M ! u "^^^ °" '^^'^ °' ««l"«"«=e similarities [1 . 2]. groups- 

-^Eubactenal LI. - Algal and plant chloroplast LI. - Cyanelle LI. - Archaebacterlal LI. - Vertebrate L10A Yeast 
SSM1 .As a signature pattern, the best conserved region was selected located in the central section of these proteS 
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It is located at the end of an alpha helix thought to be involved In RNA-binding. 

[1314] Consensus pattern: [IM]-x(2)-(LIVA)-x(2.3)-(UVM]-G-x(2)-[LMS]-[GSNH]-(PTKR]-[KRAV]-G-x-(UMFJ-P- 

[UcNo I t\vj}J 

[1] Nikonov S.V.. Nevskaya N., Ellseikina I.A.. Fomenkova N.R. Nikulin A.. Ossina N.. Garter M.. Jonsson 6 -H 
BnandC. Al-Karadaghi S., Svensson LA, Aevarsson A.. Liljas A. EMBO J. 15:1350-1359(1996) 
[ 2] Olvera J.. Wool I.G. 2.3.CO:2-'Biochem , Biophvs. Res. Commun. 220:954-957n 996). 

[1315] 492. Ribosomal protein L10 signature 

Ribosomal protein L10 is one of the proteins from the large ribosomal subunit. LI 0 Is a protein of 162 to 1B5 amino- 
acid residues which has only been found so far in eubacteria. A conserved region located in the N-terminal section of 
these proteins was used as a signature pattem. 

[1316] Consensus pattem: [DEHJ-x(2)-IGS]-[LIVI^FHSTN]-[VAJ-x-[DEQK]-(LIVMA]-x(2)-[LIMl-R 
[1317] 493. Ribosomal protein LI Oe signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities 
One of these families consists of: - Vertebrate L10 (QM) [1]. - Plant L10. - Caenorhabditis elegans L10 (F10B5 1) - 
Yeast L10 (QSR1). - Methanococcus jannaschii MJ0543.These proteins have 174 to 232 amino-acid residues A con- 
served regton located in the central section was selected as a signature pattern. 
[1 31 8] Consensus pattem: R-x-A-[FYW]-G-K-[PA]-x-G-x(2)-A-R-V 

[1] Chan Y.-L. Diaz J.-J.. Denoroy L.. Madjar J.-J., Wool I.G. 2.3.CO:2--Biochem. Btonhvs. Res. Commun, 255- 
[1319] 494. Ribosomal protein L11 signature 

[1320] Ribosomal protein L11 is one of the proteins from the large ribosomal subunit. In Escherichia coli L11 is 
known to bind directly to the 23S rRNA. It belongs to a family of ribosomal proteins which, on the basis of sequence 
similarities [1,2], groups: ^ 



Eubacterial L11. 

- Plant chloroplast L11 (nuclear-encoded). 
Read algal ch loroplast L1 1 . 

- CyanelleLII. 

- Archaebacterial L11. 
Mammalian L12. 

- Plants LI 2. 

- YeastL12(YL15). 



[1321] L11 IS a protein of 140 to 165 amino-acid residues. A conserved region located in the C-terminal section of 
these proteins was selected as a signature pattem. In Escherichia coli. the C-terminal half of L11 has been shown (31 
to be in an extended and loosely folded conformation and is likely to be buried within the ribosomal structure 
[1322] Consensus pattem: [RKN]-x-[LIVM]-x-G-[ST]-x(2).[SNQHLI VM]-G-x(2)-[LIVMJ-x(0. 1 )-[DENG] 

[ 1] Pucciarelli G., Remacha M.. Ballesta J.RG.; Nucleic Acids Res. 18:4409-4416(1990). 
[ 2] Otaka E., Hashimoto T. Mizuta K., Suzuki K.; Protein Seq. Data Anal. 5:301-313(1993) 
[ 3] Choli T Biochem. Int. 19:1323-1338(1989). 

[1 323] 495. Ribosomal protein L7/L1 2 C-terminal domain 
[1324] [1 ] Leijonmarck M. Liljas A; J Mol Biol 1 987; 1 95:555-579. 
[1325] 496. Ribosomal protein L1 3 signature 

Ribosomal protein LI 3 is one of the proteins from the large ribosomal subunit. In Escherichia coli. LI 3 is known to be 
one of the early assembly proteins of the 50S ribosomal subunrl. It belongs to a family of ribosomal proteins which on 
the basis of sequence similarities [1], groups: - Eubacterial LI 3. 

- Plant chloroplast LI 3 (nuclear-encoded). - Red algal chloroplast LI 3. 

- Archaebacterial LI 3. - Mammalian LI 3a (Tum PI 98). - Yeast Rp22 and Rp23. 

[1326] L1 1 is a protein of 140 to 250 amino-acid residues. As a signature pattem. a conserved regbn was selected 
located in the C-terminal section of these proteins. 

\gDN] f'-'^^^f*^'^^-[^»<l-'^-[L»V]-[PS]-x(4.5)-[GS].[NQEKRA]-x(5)-(LIVM]-x-{AIV^ 
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[1328] [ 1] Chan Y.-L. Olvera J.. Glueck A., Wool LG. J. Biol. Chem. 269*5589-5594(1994) 
[1 329] 497. Ribosomal protein LI 3e signature 

A number of eukaryotic ribosomal proteins can be grouped on the basis of sequence similarities [11. One of these 
families consists of: i j 

- Vertebrate LIS (was previously known as Breast Basic Consen/ed protein 1 (BBC1)). - Drosophila LI 3 - Plant 
LI 3. - Yeast probable LI 3 ( YM9375. 11c). / k • i 

J^TI'.T^^^ ^^""^ ^ amino-acid residues. As a signature pattem, a stretch of about 16 residues in the first 

third of these proteins selected. 

- Consensus pattern: [KR]-Y-x(2)-K-[LIVM]-R-[STA]-G-[KRl-G-F-[ST]-L-x-E 

[1330] [ 1] Olvera J., Wool LG, Biochem. Biophys. Res. Commun. 201:102-107(1994) 
[1 331] 498, Ribosonnal protein LI 4 signature 

Ribosomal protein L14 Is one of the proteins from the large ribosomal subunit. In eubacteria. L14 is known to bind 
directly to the 23S rRNA It belongs to a famify of ribosomal proteins which, on the basis of sequence similarities fll 
groups: - Eubacterial L14. - Algal and plant chloroplast L14. - Cyanelle L14. - Archaebacterial L14 - Yeast L17A. - 
Mammalian L23. , / 

- Caenorhabditis elegans L23 (B0336.10). - Higher eukaryotes mitochondrial L14. 

- Yeast mitochondrial Yml38 (gene MRPL38). 

LI 4 is a protein of 1 1 9 to 1 37 amino-acid residues. As a signature pattem, a conserved region located in the C-terminal 
half of these proteins was selected. 

- Consensus pattem: [G A]-[LI VE(3)-x(9. 1 0)-[DNS]-G-x(4)-[FY]-x(2)-[NT]-x(2)-V-[LI V] 

[1332] [ 1] Otaka E., Hashimoto T. Mizuta K., Suzuki K. Protein Seq. Data Anal. 5:301-313(1993) 
[1333] 499. Ribosomal protein LI 5 signature 

tho^c'^l^f u ^ ^'"^ '^'96 ribosomal subunit. In Escherichia coli. LI 5 is known to bind 

the 23S rRNA. It be ongs to a family of ribosomal proteins which, on the basis of sequence similarities [1], groups' - 
Eubacterial L15. - Plant chloroplast L15 (nuclear-encoded). 

- Archaebacterial LI 5. - Vertebrate L27a. - Tetrahymena thermophila L29 

- Fungi L27a (L29. CRP-1 , CYH2). 

L15 is a protein of 144 to 154 amino-acid residues. As a signature pattem. a conserved region was selected in the C- 
terminal section of these proteins. 

A-X(o)-[LI VMJ-X(3)-G 

[1334] [ 1] Otaka E., Hashimoto T. Mizuta K.. Suzuki K. Protein Seq. Data Anal 5*301-313(1993) 
[1335] 500, Ribosomal protein L15e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities 
[1 J. One of these families consists of: 

- Mammalian LI 5. - Insect LI 5. - Plant LI 5. - Yeast YL10 (LI 3) (RplSr). 
Thermoplasma acidophilum LI 5. 

These proteins have about 200 amino acid residues. As a signature pattem. a conserved region was selected located 
in the central section. 

- Consensuspattem:IDEHKR]-A-R-x-L-G-[FY]-x-[SAP}-x(2)-G-[LIVMFY)(4)-R.x-R-[IV]-x.R-G 

[ 1] Zwickl P. Lupas A., Baumeister W. 

Biochem. Biophys. Res. Commun. 209:684-688(1995). 
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[133S] 501 . Ribosomal protein L17 signature 

Ribosomal protein LI 7 is one of the proteins from the large ribosomal subunit. LI 7 belongs to a family of ribosomal 
proteins which, on the basis of sequence similarities, groups: - Eubacterial LI 7. 

- Yeast mitochondrial YmL8 (gene MRPL8). 

Eubacterial L17 is a protein of 1 20 to 1 30 amino-acid residues. Yeast YmL8 is twice larger (238 residues) the sequence 
of rts N-terminal half is colinear with that of eubacterial LI 7. As a signature pattern, a conserved region in the N-terminal 
section was selected. 

- Consensus pattem:|.x-[STl-[GT]-x(2).[KR]-x.K-x(6HDE]-x-[LIMV]^LIVM 
[1337] 502. Ribosomal protein L18e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities 
One of these families consists of; 

- Vertebrate LI 8 (known as LI 4 in Xenopus) [1]. - Plant L18. 

- Yeast L18 (Rp28). - Habbacterium marismortui H129. 

- Sulfolobus acidocaldarius H129e. 

These proteins have 115 to 187 amino-acid residues., A stretch of about 13 residues in the first third of these proteins 
has been selected as a signature pattern. 

- Consensus pattern: [KRE]-x-L-x(2)-[PS]-[KR]-x(2)-[RHHPSA].x-[LIVMHNSl-[LIVM]-x^^ 

[ 1] Puder M., Barnard G.R. Staniunas RJ.. Steele G.D. Jr.. Chen LB. 
Biochim. Biophys. Acta 1216:134-136(1993). 
[1338] 503. Ribosomal L18p family 

It has been shown that the amino tenninal 93 amino acids of Swiss:P09895 are necessary and sufficient to bind 5S 
rRN A in vitro. The cariDoxyl-terminal half of the protein, comprising amino acids 1 51 -296, serves to localize the protein 
to the nucleolus [1], 
Number of members: 26 
[1] 

Medline: 96212235 

Distinct domains in ribosomal protein L5 mediate 5 S rRNA binding and nucleolar localization 
r\/1ichael WM. Dreyfuss G; 
J Biol Chem 1996;271:11571-11574. 
[1339] 504. Ribosomal protein LI 9 signature 

Ribosomal protein L1 9 is one of the proteins from the large ribosomal subunit. In Escherichia coli. LI 9 is known to be 
lojated at the 30S-50S ribosomal subunit interface and may play a role in the structure and function of the aminoacyl- 
tRNA binding site. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities, groups- - 
Eubacterial LI 9. • » k - 

- Red algal chloroplast LI 9. - Cyanelle LI 9. 

LI 9 is a protein of 120 to 1 30 amino-acid residues., 

A conserved region in the C-terminal section has been selected as a signature pattern. 

- Consensus pattern: [LI VM].x-[KRGTI].x-[GSAI].tKRQDA]-[VGHRSN]-X(0.1 )-[KR]-[SA]-[KYHKLIHLYS]-Y-[LIM]- 
R 

[1 340] 505. Ribosomal protein LI 9e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities 
One of these families consists of: 

- Mammalian riboscmnal protein LI 9 [1 ]. - Drosophila ribosomal protein LI 9 [2]. 

- Slime mold (D. discoideum) vegetative specific protein VI 4 [3]. 

- Yeast ribosomal protein LI 9 (YLI 4). - Archebacterial ribosomal protein LI 9E. 
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[1341] These proteins have 148 to 203 amino-acid residues. 

A stretch of about 20 residues in the N-terminal part of these proteins has been selected as a signature pattern. 

- Consensus pattern: Q-[KRJ-R-[LIVM]-x-[SAI-x(4)-[CV]-G.x(3)-[IV]-[WK]-[LlVF]-[DN]-P 

[ 1] Chan Y.-L. Lin A., McNalty J.. Peleg D., Meyuhas O., Wool I.G. 
J. Biol. Chem. 262: 11 11-1 11 5(1 987). [2] Hart K,. Klein T, Wilcox M. 
Mech. Dev. 43:101-110(1993).[ 3] Singleton C.K., Manning S.S.. Ken R. 
Nucleic Acids Res. 17:9679-9692(1989). 
[1 342] 506. Ribosomal protein LI e signature (Ribosomal_L4) 

A number of eukaryotic and archaebacterial ribosonnal proteins can be grouped on the basis of sequence similarities 
One of these families consists [1 .2.3. 4] of: - Vertebrate LI (L4). - Drosophila LI . - Plant LI . - Yeast L2 (Rp2). 

- Fission yeast L2. - Halobacterium marismortui HmaL4 (HL6). 
Methanococcus jannaschii MJ0177. 

These proteins have 246 (archaebacteria) to 427 (human) amino acids. A conserved region in the N-terminal part of 
these proteins has been selected as a signature pattern. 

- Consensus pattern: N-x(3)-[KRM].x(2)-A-[LI\ni-x-S-A-[LIV]-x-A-[STl-[SGA]-x(7)-^ 

[ 1] Rafti R. Gargiulo G.. Manzi A., Malva C, Graziani F. 

Nucleic Acids Res. 17:456-456(1 989). [ 2] Presutti C. Villa T. Bozzoni I. 

Nucleic Acids Res. 21:3900-3900(1993). 

[ 3] Bagni C, Mariottini R. Annesi P., Amaldi F. 

Biochim. Biophys. Acta 1216:475-478(1993). 

[ 3] Arndt E., Kroemer W.. Hatakeyama T J. Biol. Chem. 265:3034-3039(1990). 
[1343] 507. Ribosomal protein L2 signature 

Ribosomal protein L2 is one of the proteins from the large ribosomal subunrt. In Escherichia coli. L2 is known to bind 
to the 23S rRNA and to have peptidyltransferase activity. It belongs to a family of ribosomal proteins which on the 
basis of sequence similarities [1 .2], groups: - Eubacterial L2. 

- Algal and plant chloroplast L2. - Cyanelle L2. - Archaebacterial L2. 
Plant L2. - Slime mold L2. - Marchantia polymorpha mitochondrial L2. 

- Paramecium tetraurelia mitochondrial L2. - Fission yeast K5, K37 and KD4. 

- Yeast YL6. - Vertebrate L8. 

The best conserved region located in the C-temiinal section of these proteins has been selected as 
a signature pattern. 

- Consensus pattem: P-x(2)-R-G-[STAIV](2)-x-N-[APK]-x-[DE] 

[1] Marty I.. Meyer Y 

Nucleic Acids Res. 20:1517-1522(1992). 

[ 2] Otaka E., Hashimoto T, Mizuta K., Suzuki K. 

Protein Seq. Data Anal. 5:301-313(1993). 

[1344] 508. Ribosomal protein L20 signature 

Ribosomal protein L20 is one of the proteins from the large ribosomal subunit. In Escherichia coll, L20 is known to bind 
directly to the 23S rRNA. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities [11 
groups: - Eubacterial L20. - Algal and plant chloroplast L20. 

- Cyanelle L20. 

L20 is a protein of about 1 20 amino-acid residues. A conserved regk>n located in the centfBl section of these proteins 
has been selected as a signature pattem. 
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- Consensus pattern: K-x(3)-[KRC]-x-[LIVM]-W-[IV].[STNALV]-R-[LIVM]-[NS]-x(3)-[RKHS] 

[ 1] Otaka E., Hashimoto T. Mizuta K., Suzuki K. 

Protein Seq. Data Anal. 5:301-313(1993). 
[1345] 509. RilJosonial protein L21e signature 

A nunnber of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities 
One of these families consists of: 

- Mammalian L21 [1]. - Entamoeba histolytica !_21 (2]. 

- Caenorhabditis elegans L21 (C14B9.7). - Yeast L21E (URP1) [3]. 
Halobacterium marlsmortui HL31 [4]. 

These proteins have 160 (eukaryotes) or 95 (archebacteria) amino-acid residues. A conserved region in the central 
part of these proteins has been selected as a signature pattern. 

- Consensus pattern: G-[DE]-x-V-x(1 0)-[G V]-x(2)-[FYH]-x(2)-[FY]-x-G-x-T-G 

[ 1] Devi K.R.G,, Chan Y-L, Wool I.G. 
Biochem. Biophys. Res, Commun. 162:364-370(1989). 
[ 2] Petter R., Rozenblatt S., Nuchamowitz Y, Mirelman D. 
Mol. Biochem. Parasitol. 56:329-333(1992). 

[ 3] Jank B., Waldherr M., Schweyen R.J. Curr. Genet, 23:15-18(1993). 
[ 4] Hatakeyama T, Kimura M. Eur. J. Biochem. 172:703-711(1988). 

[1346] 510. Ribosomal protein L21 signature 

Ribosomal protein L21 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L21 is known to bind 
to the 23S rRNA in the presence of L20, it belongs to a family of ribosomal proteins which, on the basis of sequence 
similarities, groups: - Eubacterial l_21. 

- Marchantia polymorpha chloroplast L21 . - Cyanelle L21 . 
Spinach chloroplast L21 (nuclear-encoded). 

Eubacterial L21 is a protein of about 100 amino-acid residues, the mature form of the spinach chloroplast L21 has 200 
residues. A conserved region located in the C-terminal section of these proteins has been selected as a signature 
pattem. ^ 

Consensus pattem: [IVT]-x(3)-[KR]-x(3)-[KRQ]-K-x(6)-G-[HF]-R-[RQ].x(2)-[ST] 
[1 347] 51 1 . Ribosomal protein L22 signature 

Ribosomal protein L22 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L22 is known to bind 
23S rRNA. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities [1 2 3] orouDS- - 
Eubacterial L22. i » • j- » k ■ 

- Algal and plant chloroplast L22 (in legumes L22 is encoded in the nucleus instead of the chloroplast) - Cvanelle 
L22. - Archaebacterial L22. /• / 

- Mammalian LI 7. - Plant LI 7. - Yeast YL1 7. 

A conserved region kxjated in the C- terminal section of these proteins has been selected as a signature pattem, 

- Consensus pattem: [RKQN]-x(4)-[RH]-[GAS].x-G.[KRQS]-x(9)-[HDN].[LIVM]-x-[LIVMS]-x-[LIVM] 
[ 1] Gantt J.S.. Baldauf S.L, Calle RJ.. Weeden N.F.. Palmer J.D. 

EMBO J. 10:3073-3078(1 991 ).[ 2] Madsen LH.. Kreiberg J.D., Causing K. Curr. Genet 19 41 7-422(1 991) 
[ 3] Otaka E., Hashimoto T, Mizuta K., Suzuki K. 
Protein Seq. Data Anal. 5:^1-313(1993). 

[1348] 512. Ribosomal protein L23 signature 

Ribosomal protein L23 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L23 is known to bind 
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m'SLIL'*^.';^^ '"^''^ ^rresponding protein binds to a homologous site on the 26S rRNA 

m^tbelongstoafanriilyof nbosomalprotei^^ 

- Algal and plant chloroplast L23. - Archaebacterial L23. - Mammalian L23A 

- Caenorhabditis elegans L23A (F55D10.2). - Fungi L25. 

- Yeast mitochondrial YmL41 (gene MRPL41 or MRP20). 

[1349] A small conserved region in the C-temiinal section of these proteins, which Is probably involved in rRNA- 
binding has been selected as a signature pattern [2]. mvoivea in tmina 

- Consensus pattern: (RK](2)-[AM]-[IVFYT].[IV]-[RKT]-L.[STANEQK]-x(7)-[LIVMFT] 

[ 1J El Baradi TT.A.L, Raue H.A., van de Regt CH.R, Verbree E C 
Planta RJ. EMBO J. 4:210-2107(1985). 

[ 2] Raue H.A., Otaka E. Suzuki K. J. Mol. EvoL 28:418-426(1989). 
[ 3] Fearon K.. Mason T.L. J. Biol. Chem. 267:5162-5170(1992). 
[ 4] Otaka E., Hashimoto T, Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

[1350] 513. Ribosomal protein L24 signature 

Ribosomal protein L24 is one of the proteins from the large ribosomal subunit, L24 belongs to a family of ribosomal 
proteins which, on the basis of sequence similarities, groups: - Eubacterial L24. ribosomal 

- Plant chloroplast L24 (nuclear-encoded). - Red algal L24. - Vertebrate L26 

- Yeast L26 ( YL33). - Archaebacterial HmaL24 (HL1 5), 

- A probable ribosomal protein from Sulfolobus acidocaldarius [1]. 

In their mature fomi, these proteins have 103 to 150 amino-acid residues 

A conserved stretch of 20 residues in their N-terminal section has been selected as a signature pattern. 

- Consensuspattem:[GDENJ-D-x-V.x-[IV].[LIVMA].x-G-x(2).[KRA].[GNQ]-x(2.3)-[GA]-x.[l^ 

[ 1] Ouzounis C, Kyrpides N.. Sander C. 
35 Nucleic Acids Res. 23:565-570(1 995). 

[1351] 514. Ribosomal protein L24e signature 

'^^i^i^rr'*' '"^°™' °" »- 

- Mammalian ribosomal protein L24. 

- Yeast ribosomal protein L30A/B (Rp29) (YL21 ). 
Kluyveromyces lactis ribosomal protein L30. 

- Arabidopsis thallana ribosomal protein L24 homolog. 
Haloarcula marismortui ribosomal protein HL21/HL22. 

^ - Methanococcus jannaschii MJ1201. 

These proteins have 60 to 160 aminoacid residues. The most conserved region, which is located in the N-terminal 
region of these proteins has been selected as a signature pattern. i^^-ierminal 

^ - Consensus pattern: [FY]-x-[GSH]-x(2)-[IV]-x-P-G.x-G-x(2)-(FYV]-x-[KRHE]-x-D 

[ 1] Chan Y-L, Olvera J.. Wool LG. Biochem. Biophys. Res. Commun. 202-1176-1180(1994) 
[1352] 615. Ribosomal protein l_27 signature 

55 ^l^^I?'"'!^?'^*" .'f l'" ^'^"^ '^^9e ribosomal subunit. L27 belongs to a famiV of ribosomal 

^ proteins which, on the basis of sequence similarities [1 ,2], groups: - Eubacterial L27. noosomai 

- Plant chloroplast L27 (nuclear-encoded). - Algal chloroplast L27. 

- Yeast mitochondrial Yml^ (gene MRPL2 or MRP7). 
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The schematic relationship between these groups of proteins is shown below. Eub. L27 NxxxxxxxxxAlgal L27 
Nxxxxxxxxx 

Plant L27 tttttNxxxxxxxxxxxxx 

Yeast MRP7 tttNxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

transit peptide. 
'N': N-temiinal of nnature protein.'**: position of the pattern. 

- Consensus pattern: G-x-[LIVM](2)-x-R-Q-R-G-x(5)-G 

[ 1] Elhag G.A.. Bourque DP. Biochemistry 31:6856-6864(1992). 
[ 2] Otaka E., Hashimoto T. Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

[1 353] 51 6. Ribosomal L28 family 

The ribosomal 28 family includes L28 proteins from bacteria and chloroplasts. The L24 protein from yeast Swiss: 
P36525 also contains a region of similarity to prokaryotic L28 proteins. L24 from yeast is also found in the large ribos- 
omal subunit 
Number of members: 24 
[1354] 517. Ribosonnal protein L29 signature 

Ribosomal protein L29 is one of the proteins from the large ribosomal subunit. L29 belongs to a family of ribosomal 
proteins which, on the basis of sequence similarities [1], groups: - Eubacterial L29. - Red algal L29. 

- Archaebacterial L29. - Mammalian L35 - Caenorhabditis elegans L35 (ZK652.4). 

- Yeast L35. 

L29 is a protein of 63 to 1 38 amino-acid residues. 

A conserved region located in the central section of L29 has been selected as a signature pattern. 

- Consensus pattem: [KNQS]4PSTL]-x(2)-[LIMFA]-[KRGSAN]-x-[LIWSTA]-[KR]-[KRHQS]-[DESTANRL]-[UVl-A- 
[KRCQVT]-[LIVMA] 

[ 1] Otaka E., Hashimoto T. Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 
[1355] 518. Ribosomal protein L3 signature 

Ribosomal protein L3 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L3 Is known to bind 
to the 23S rRNA and may participate in the formation of the peptldyltransferase center of the ribosome. It belongs to 
a family of ribosomal proteins which, on the basis of sequence similarities [1,2,3.4], groups: - Eubacterial L3. - Red 
algal L3 - Cyanelle L3. 

- Archaebacterial Halobacterium marismortui HmaL3 (HL1 ). 

Yeast L3 (also known as trichodermin resistance protein) (gene TCM1 ). 

- Arabidopsis thaliana L3 (genes ARP1 and ARP2). - Mammalian L3 (L4). 

- Mammalian mitochondrial L3. - Yeast mitochondrial YmL9 (gene MRPL9). A conserved region located in the central 
section of these proteins has been selected as a signature pattern. 

- Consensus pattem: [FL]-x(6)-[DN]-x(2)-[AGS]-x-[ST]-x-G-[KRH]-G-x(2)-G.x(3)-R 

[ 1] Arndt E., Kroemer W., Hatakeyama T J. Biol. Chem. 265:3034-3039(1990). 

[ 2] Graack H.-R., Grohmann L, Kitakawa M., Schaefer K.L., Kruft V. 

Eur. J. Biochem. 206:373-380(1992). 

[ 3] Henwig S., Kruft V., Wittmann-Liebold B. 

Eur. J. Biochem. 207:877-885(1992). 

[ 4] Otaka E., Hashimoto T. Mizuta K.. Suzuki K. 

Protein Seq. Data Anal. 5:301-313(1993). 

[1 356] 51 9. Ribosomal protein L30 signature 

Ribosomal protein L30 is one of the proteins from the large ribosomal subunit. L30 belongs to a family of ribosomal 
proteins which, on the basis of sequence similarities [1], groups: - Eubacterial L30. - Archaebacterial L30. 



198 



EP1 033 405A2 

- Drosophila L7. - Slime mold L7. - Mammalian L7. - Fungi L7 (YL8). 

- Yeast mitcxjhondrial L33. 

L30 frc^ eubacteria are small proteins of about 60 residues, those from archaebacteria are proteins of about 150 
residues. Eukaryotic L7 are proteins of about 2^ to 270 residues. The schematic relationship between the three groups 
of proteins is shown below.Eub. L30 NxxxxxxxxxxC 
Arc. L30 NxxxxxxxxxxxxxxxxxxxxxxxxxxxC 

Nxxxxx)ocxxxxxxxxxxxxxxxxxxxxxxxxx)oooooooooooooo(C position of the pattern. 

The signature pattern for this family of ribosomal proteins spans the N4erminal half of the region common to all these 
proteins. 

- Consensus pattem: (INH-HLI VM].x(2)-[LF].x-[LI]-x.[KnHQEG]-x(2).[STNQH].x-[l\m-x(1 OW^^ 
VA]-x(2)-[LMFYl-[IVT] ^ ^ ^ ^ ^ ^ 

[ 1] Mizuta K., Hashimoto T. Otaka E. 
Nucleic Acids Res. 20:1011-1016(1992). 
[1357] 520, Ribosomal protein L31 signature 

Ribosomal protein LSI is one of the proteins from the large ribosomal subunit. L31 is a protein of 66 to 97 amino-acid 

residues which has only been found so far in eubacteria and in some algal chloroplasts. 

A conserved region located in the central section of these proteins has been selected as a signature pattem. 

- Consensus pattem- H-P-F-[FY]-[TI]-x(9)-G-R-[AIV].x-[KRQ] 
[1 358] 621 . Ribosomal protein L31 e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities 
One of these families consists of: 

- Mammalian L31 [1]. - Chlamydomonas reinhardtii LSI . - Yeast L34. 
Halobacterium marismortui HL30 [2]. 

These proteins have 87 to 128 amino-acid residues. 

A conserved region, located in the central section has been selected as a signature pattem. 

- Consensus pattem: V-[KR]-[LIVM]-x(3)-[LIVMJ-N-x-[AKH]-x-W-x-[KR]-G 

[ 1] Tanaka T, Kuwano Y, Kuzumaki T. Ishikawa K.. Ogata K. Eur. J. Biochem. 162:45-48 (1987). [2] Bergmann U.. 
Amdt E. 

Biochim. Biophys. Acta 1050:55-60(1990). 
[1359] 522. Ribosomal protein L33 signature 

Ribosomal protein L3S is one of the proteins from the large ribosomal subunit. In Escherichia coli, L33 has been shown 
to be on the surface of 50S subunit. L3S belongs to a family of ribosomal proteins which, on the basis of sequence 
similarities [1 .2,3], groups: - Eubacterial L33. 

- Algal and plant chloroplast L33. - Cyanelle L33. 

L33 is a small protein of 49 to 66 amino-acid residues. A conserved region located in the central sectbn of L33 has 
been selected as a signature pattem. 

- Consensus pattem: Y-x-[ST]-x-[KR]-[NS]-x(4).[PATQ]-x(1 .2).[LI VM]-[EA]-x(2)-K-[FY]-[CSD] 

[ 1] Kruft v., Kapp U., Wtttmann-Liebold B, Bkxhimie 73:855-860(1991). 
[ 2] Sharp P.M. Gene 139:129-130(1994). 
[ 3] Otaka E., Hashimoto T, Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

[1360] 523. Ribosomal protein L34 signature 

[1381] Ribosomal protein L34 is one of the proteins from the large subunit of the prokaryotic ribosome It is a small 
basic protein of 44 to 51 amino^cid residues [1]. L34 belongs to a family of ribosaral proteins which, on the basis of 
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sequence similarities, groups: - Eubacterial L34. 

Red algal chloroplast L34. - Cyanelle L34. 
A conserved region that corresponds to the N-tenminal half of L34 has been selected as a signature pattern. 

- Consensus pattern: K.[RG]-T-[FYWL].[EQS]-x(5)-[KRHS].x(4.5)-G-F-x(2).R 

[ 1] Old I.G., h/largarita D., Saint Girons I. 
Nucleic Acids Res. 20:6097-6097(1992). 
[1362] 524. Ribosomal protein L34e signature 

s„3t,icr;r.:r " '■■^ 

- Mammalian L34. - Mosquito L31 [1]. - Plant L34 [2]. 

- Yeast putative ribosomal protein YIL052c. - Methanococcus jannaschii MJ0655. These proteins have 89 to 129 

amino-acid residues. « i^o 

A consented region located in the N-termlnal section of these proteins has been selected as a signature pattern. 

- Consensus pattern: Y-x-[ST]-x-S-[NY]-x(5)-[KR]-T-P.G 

[ 1] Lan Q.. Niu LL, Fallon A.M. 
Biochinn. Biophys. Acta 1218:460-462(1994). 
[ 2] Gao J., Kim S.R.. Chung Y Y., Lee J.M.. An G. 
Plant Mol. Biol. 25:761-770(1994). 

[1363] 525. Ribosomal protein L35Ae signature 

- Vertebrate L35A. - Caenorhabditis elegans L35A (F10E7.7). 

- Yeast L37A/L37B (Rp47). - Pyrococcus woesei L35A homolog [1 ]. 

These proteins have 87 to 110 amino-acid residues 

A^highly conserved stretch of 22 residues in the C-termlnal part of these proteins has been selected as a signature 

- Consensus pattern: G-K-[LIVM]-x-R-x-H-G-x(2)-G-x-V-x-A-x-F-x(3)-[LI]-P 

[ 1] Ouzounis C, Kyrpides N., Sander C. 
Nucleic Acids Res. 23:565-570(1995). 
[1364] 526. Ribosomal protein L36 signature 

S^Z!l^'''''t" 't'^' '""^l^"' '^'9e subunit Of the prokaryotic ribosome. It belongs to a family 

of nbosomal proteins which, on the basis of sequence similarities [1], groups: - Eubacterial L36 - Alqal and 
chbr^^^^^^^ 

Consensus pattern: C-x(2).C.x(2)-[LIVM]-x.R-x(3).[LIVMN].x-[LIVM]-x-C.x(3,4)-[KR]-H-x-Q.x-^ 
1 1] Otaka E.. Hashimoto T. Mizuta K. Protein Seq. Data Anal. 5:285-^0(1993) 
[1 365] 527. Ribosomal protein L36e signature 

- Drosophila L36 (M(1)1B). - Caenorhabditis elegans L36 (F37C12.4) 

- Candida albicans L39. - Yeast YL39. 

These proteins have 99 to 104 amino acids. 
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A conserved region in the central part of these proteins has been selected as a signature pattern. 

- Consensus pattern: P-Y-E-[KR]-R-x-[LIVMl-{DEHLIVM](2HKR] 

( 1] Chan Y.-L.. Paz V.. Oivera J., Wool I.G. 

Biochem. Biophys. Res. CcMnmun. 192:849-853(1993). 

[1366] 528. Ribosomal protein L39e signature 

section of these proteins has been selected as a signature pattern 

- Consensus pattern: IKRA]-T-x(3)-[LIVM]-[KRQF]-x-[NHS]-x(3)-R-(NHY]-W-R-R 

[ 1] Lin A., McNally J., Wool I.G. J. Biol. Chem. 259:487-490(1984) 
1 2] Leer R.J.. van Raamsdonk-Duin M.M.C., Kraakman P.. Mager W H 
Planta R.J. Nucleic Acids Res. 1 3:701 -709(1 985). 
1 3] Ramirez C. Louie K.A., Matheson A.T. FEBS Lett. 250:416-418(1989). 

[1 367] 529. Ribosomal L40e family 

'rll'l^mbt^^ T''^' " ' ^"^"''^^ "'^'^ '""^'"^ P^^'^'" t^J- '° a ubiqu.n protein 12,. 

[1] 

l\^edline: 88203200 

RNA binding proteins of the large subunit of bovine mitochondrial ribosomes 
Piatyszek MA, Denslow ND, O'Brien TW; 
Nucleic Acids Res 1988;16:2565-2583 
^^[2]Medline: 96011832 The carboxyl extensions of two rat ubiquitin fusion proteins are ribosomal proteins S27a and 

Chan YL, Suzuki K. Wool IG; 
Biochem Biophys Res Commun 1995;21 5:682-690. 
[1368] 530. {Ribosomal L44) Ribosomal protein L44e signature 

- Mammalian L44 [ 1 ]. - Trypanosoma brucei L44. 

- Caenorhabditis elegans L44 (C09H1 0.2). - Fungal L44 (L41 ) 

- Halobacterium marismortui LA [2J. 

These proteins have 92 to 105 amino-acid residues 

A conserved region located in the C-terminal part of these proteins has been selected as a signature pattern. 

- Consensus pattem: K-x-[TV]-K-K-x(2)-L-[KR]-x(2)-C 

[ 1] Gallagher M.J., Chan Y-L. Lin A., Wool I.G. DNA 7-269-273(1 988) 
[ 2] Bergmann U., Wittmann-Liebold B. 
Biochim. Biophys. Acta 1173:195-200(1993 

[1369] 531 . Ribosomal protein L5 signature 

- Algal chloroplast L5. - Cyanelle L5. - Archaebacterial L5. - Mammalian L11 

- Tetrahymena thermophila L21. - Slime nrokJ L5 (V18) - Yeast L16 (39A) 

- Plants mitochondrial L5. 
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L5 is a protein of about 180 amino-acid residues. 

A consented region, located in the first third of these proteins has been selected as a signature pattern. 

- Consensus pattern: [LIVM]-x(2HLI\^4STAVC]-tGE]-[QV].x(2).[LIVI^]-x-[STC]-x-[STAG]-[KRH]-x-p 

[ 1] Hatakeyama T, Hatakeyama T. Biochim. Biophys. Acta 1039:343-347(1990) 

[ 2] Rosendahl G.. Andreasen P.H., Kristiansen K. Gene 98:161-167(1991) 

i 2 ^"5 'L"" ' • A.T.. Auer J.. Splcker G.. Boeck A. Biochimie 73:679-682(1991) 

[ 4] Otaka E., Hashimoto T. Mizuta K.. Suzuki K. Protein Seq. Data Anal. 5:301-31 3(1993). 

[1 370] 532. ribosomal L5P family C-terminus 

lllVJi ^'^ associated with RibosomaLLS. Number of members- 60 

[1372] 533. Ribosomal protein L6 signatures 

[1373] Ribosomal protein L6 Is one of the proteins from the large ribosomal subunit In Escherichia coli LS k knnu« 
to b.nd directly to the 23S rRNA and is located at the aminoacyl-^A binding site o the pS^sfe as! ce^Z^i 
belongs toafamily Of ribosomal proteins whlch.on the basisof sequence slmiLies[1.2,3^!;gro^^^^^^^ 

- Algal chloroplast L6. 
Cyanelle L6. 
Archaebacterial L6. 

• Marchantia polymorpha mitochondrial L6. 

- Yeast mitochondrial YmL6 (gene MRPL6). 
Mammalian L9. 

Drosophila L9. 

- Plants L9. 

- Yeast L9(YL11), 

Si^l ^" T""^ ^'""^^'"^ evolutionary related it is very difficult to derive a pattern that will find them 

- Consensus pattern: [PSHDENS]-x-Y-K-[GA]-K-G-[LIVM] 

- Consensus pattern: Q.x(3)-[LIVM]-x(2).[KR]-x(2)-R-x-F-x-D-G.[LIVM]-Y-[LIVM]-x(2H 
[1] Suzuki K., Olvera J., Wool I.G. Gene 93:297-300(1990) 

[2] Schwank S.. Harrer R.. Schueller H.^.. Schweizer E. Curr Genet. 24:136-140(1993) 

[3] Golden B.L, Ramakrishnan V. White S.W. EMBO J. 12:4901-4908(1993) 

[4] Otaka E.. Hashimoto T. Mizuta K.. Suzuki K. Protein Seq. Data Anal. 5:301-313(1993). 

[1375] 534. Ribosomal protein L6e signature 

*"°™' ^ »» •-^ <=- ^^'^ 

■ >tem™,*„ ribosomal p«,t»i„ p„,ta„ „ TAX.,.sponsl.o onhanc,, etemonl bi^ p™„in 

- Caenorhabditis elegans ribosonr^l protein L6 (R 1 51 . 3). 

- Yeast ribosomal protein YL1 SAA'LI 6B. 

- Mesembryanthemum crystallinum ribosomal protein YL1 6-|jke. 

- Consensus pattern: N-x(2)-P-L-R-R-x(4)-[FY]-V-l-A-T-S-x-K 
[13761 535. Ribosomal protein L7Ae signature 

s;e™£Et"rr '""''^ " '^"^ °" «» - — - 
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- Vertebrate L7A (SURFS) [1 ]. - Plant L7A. - Yeast L7A (YL5) (Rp6) 

- Yeast protein NHP2 [2]. - Yeast hypothetical protein YEL026w. 

- Bacillus subtilis hypothetical protein ylxQ. - Halobacterium marismortui Hs6 
Methanococcus jannaschii MJ1 203. 

[1 377] These proteins have 1 00 to 265 amino-acid residues 

A conserved region located in the central section has been selected as a signature pattern. 

- Consensus pattern: [CAl-x(4HIV]-P-[FY]-x(2)-[LIVM]-x-[GSQ]-IKRQ]-x(2)-L-G 

[ 1] Colombo P, Yon J.. Garson K.. Fried M. Proc. Natl. Acad. Sci. U.S.A. 89:6358-6362(1992) 
[ 2] Kolodrubetz D., Burgum A. Yeast 7:79-90(1991). 

[1378] 536. Ribosonnal protein L9 signature 

dlSrlh'.IS P-"';"' '"SIS "bo^ S"lH,n«. in Esche,bhia eon, L9 1, known 1= bind 

- Plant chloroplast L9 (nuclear-enccxJed). - Red algal chloroplast L9. 

A conserved region, located in the N-terminal sectfon of these proteins has been selected as a signature pattern, 

- Consensus pattern: G-x(2)-[GN]-x(4)-V-x(2)-G-[FY]-x(2)-N-[FY]-L-x(5)-[GA]-x(3)-[STN] 

io5'^^^9T ' ' ^'^ ' '^^''^ • "^"^l^rishnan V. EMBO J. 1 3: 

[ 2] Otaka E., Hashimoto T. MIzuta K., Suzuki K. Protein Seq. Data Anal. 5:301-31 3(1993). 
[1379] 537. Ribosomal protein S10 signature 

Ribosomal protein SI 0 is one of the proteins from the small ribosomal subunit. In Escherfchia coll SI 0 is known to he 

- Algal chloroplast S10. - Cyanelle S10. - Archaebacterial S10. 

- Marchantia polymorpha and Prototheca wickerhamii mitochondrial SI 0 

- Arabidopsis thaliana mitochondrial 810 (nuclear encoded). - Vertebrate S20 

- Plant S20. - Yeast URP2. 

S10 is a protein of about 100 amino-acid residues. 

[1380] A conserved region located in the center of these proteins has been selected as a signature pattem. 

- Consensus pattem: [AV]-x(3)-[GDNSR]-[LIVMSTA]-x(3)-G-P-[LIVM]-x-[LIVM].P-T 

[ 1] Otaka E., Hashimoto T, Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 
[1381] 538. Ribosomal protein S11 signature 

the'^ametbrof^^^^^ ^ essential role in selecting the correct tRNA in protein biosynthesis. It is located on 

the large lobe of the small nbosomal subunit. S11 belongs to a family of ribosomal proteins which on the basis 
sequence srmilarrties, groups (2J:-EubacterialS11. » wnKin. on me oasis ot 

• AlgalanaplantchloroplastSII. -CyanelleSII.-ArchaebacterialSII 

• Marchantia polymorpha and Prototheca wickerhamii mitochondrial S11 

- Acanthamoeba castellanii mitochondrial 811. - Neurospora crassa S14 (crp-2). - Yeast S14 (RP59 or CRY1 ^ 

- Mammalian. Drosophila. Trypanosoma, and plant S14. »i^(MKasorcRYl). 

- Caenortiabdilis elegans SI 4 (F37C1 2.9). 

One of the best consented regwns In these proteins was selected as a signature pattem. 
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[ 1] Kimura M., Kimura J.. Hatakeyama T FEBS Lett. 240-15-20(1988) 
[ 2] Otaka E., Hashimoto T, Mizuta K. 
Protein Seq, Data Anal. 5:285-300(1993). 

[13821 539. Ribosonnal protein SI 2 signature 

- Algal and plant chloroplast SI 2. - Cyanelle 812. 

- Protozoa and plant mitochondrial SI 2 - Yeast S28 

- Consensus pattem; [RKI-x-P-N-S-[AR]-x-R ^'Snaiure panem. 

( 1] Otaka E.. Hashimoto T, Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 
[1383] 540. Ribosomal protein S12e signature 

- Trypanosoma brucei SI 2 [2]. - Caenorhabditis elegans S12 fF54E7 21 

- DrosophjlaS12. - YeastS12. ' 

These proteins have ^^0^o 150 amino acids 

A consented region in the N-termlnal part of these proteins has been selected as a signature pattem. 

- Consensus pattem: A-L-[KRQPJ-x-V-L-x(2)-[SA]-x(3)-[DN]-G-L 
( 1] Lin A., Chan Y.-L. Jones R.. Wool I G 

- Plant chloroplast S13 (nuclear encoded). - Red algal chloroplast S13 

- Cyanelle S1 3. - Archaebacterial SI 3. - Plant mitochondrial Si 3 
Mammalian and plant SI 8. 

The bestconserved regions in theseprotelnsjocatedlntheirc-terminalpart^ 

- Consensus pattem: [KRQSI-G-x.R-H-x(2).[GSNHJ-x(2)-[LIVMC]-R-G-Q 

[ 1] Chan Y.-L, Paz V, Wool I.G. 

Biochem. Biophys. Res. Commun. 178:1212-1218(1991) 

[ 2] Otaka E.. Hashimoto T. Mizuta K. 

Protein Seq, Data Anal. 5:285-300(1993). 

mS ^^"^"^^ P^^^^*" S14p/S29e (Ribosomal protein S14 signature) 
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groups: 



Eubacterial S14. 

- Algal and plant chloroplast S1 4. 

- CyanelleSl4. 

- Archaebacterlal Methanococcus vannielii SI 4. 
Plant mrtochondrial SI 4. 

- Yeast mitochondrial MRP2. 
Mammalian S29. 

- Yeast YS29A/B. 



[1388] Consensus pattern: [RPl-x(0,1)-C-x(11,12)-(LIVMFJ-x-[LIVMF]-[SCHRG]-x(3HRN] 

[1] Chan Y.-L., Suzuki K.. Olvera J.. Wool I.G. Nucleic Acids Res. 21 :649-655(l 993) 
[2] Otaka E. , Hashimoto T. . Mizuta K. Protein Seq. Data Anal. 5:285-300(1 993). 

[1 389] 543. Ribosomal protein SI 5 signature 

which, on the basis of sequence similarities [1 .2]. groups: - Eubacterial SI 5. noosomai proteins 

- Archaebacterial Halobacterium marismortui HmaSIS (HS11 ) 

- Plant chloroplast SI 5. - Yeast mitochondrial S28. - Mammalian SI 3 

- Brugia pahangi and Wuchereria bancrofti SIS (SI 5). - Yeast S13 (YS15). 

S15 is a protein of 80 to 250 amino-acid residues 

A conserved region located in the C-termlnal part of these proteins has been selected as a signature pattern. 

- Consensus pattern: [LIVM].x{2).H-[LIVMFY]-x(5)-D-x(2).[SAGN]-x(3)-[LF].x(9)-[^ 

[ 1] Dang H., Ellis S.R. 
Nucleic Acids Res. 18:6895-6901(1990). 
[ 2] Otaka E„ Hashimoto T. Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

[1390] 544. Ribosomal protein S16 signature 

- Eubacterial SIS. 

- Algal and plant chloroplast SI 6. 

- CyanelleSie. 

- Neurospora crassa mrtochondrial S24 (cyt-21 ). 

[1393] Consensus pattern: [LIVMTI-x-ILIVMHKR]-L-[STAK]-R-x-G-[AKR] 

M,o2 L\^c°='^ ^ ■ ^- K. Protein Seq. Data Anal. 5:285-300(1993) 

[1395] 545. Ribosomal protein 817 signature 



Plant chloroplast S17 (nuclear encoded). - Red algal chloroplast S17 
Cyanelle S17. - Archaebacterial S17. - Mammalian and plant cytoplasmic S11. 
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- Yeast S18a and S18b (RP41 ; YS12). 

XT ^-^-^ P-teins have been selected as a signature 

- Consensus pattern: G-D-x-[LIV].x-[LIVA]-x-fQEK]-x-[RK].p.[LIV].S 

[ 1] Gantt J.S.. Thompson M.D. J. BioL Chem. 265*2763-2767(1990) 
[ 2] Herfurth E.. Hirano H., Wittmann-Liebold B. 
Biol. Chem. Hoppe-Seyler 372:955-961(1991). 
{ 3] Otaka E., Hashimoto T, Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

[1 396] 546. Ribosomal protein SI 7e signature 

- V»nobrales S17 111. - Orosophil. S17 [2]. . Neurospoia cassa S17 (crp^) 

• Consensus pattern: A-x-l-x-[ST]-K-x-L-R-N-[KR].|-A-G-[FY]-x-T-H 

[ 1J Chen l.-T, Ftoufa D.J. Gene 70:107-116(1988). 
[ 2] Maki C. Rhoads D.D.. Stewart M.J.. van Slyke B., Denell R E 
Roufa D.J. Gene 79:289-298(1 989).[ 3] Abovich N.. Rosbash m' ' 
Mol. Cell. Biol. 4:1871-1879(1984). 

[1397] 547. Ribosomal protein S18 signature 
[1398] 548. Ribosomal protein SI 9 signature 

Which, on the basis of sequence sS^ illZs f1 2Llfo;psr^^^^^^^ ' '^"'"^ °' P^'^^ 

- Algal and plant chtoroplast S19. - Cyanelle S19. - Archaebacterial S19 

- Plant mitochondrial SI 9. - Eukaryotic SI 5 ('rig' protein). 

fo:^trd;rStel'^^^^^^^^^ P^«- ^^ed on the ,ew c<.se.ed positions 

- consensus pattern: [STDNQ]-G-fKRQM)-x(6)-(LIVM]-x(4)-[LIVMHGSDhx(2)-[LFHGASH^ 

[ 1] Kitagawa M.. Takasawa S.. Kikuchi N.. Itoh T. Teraoka H.. Yamamoto H.. Okamoto H. FEBS Lett. 283:2,0-214 

[ 2] Otaka E., Hashimoto T, Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993), 

[1 399] 549. Ribosomal protein S 1 9e signature 

A number o, eukaryotlc and archaebacter^l ribosomal proteins can be grouped on the bas. o, sequence simi^ritles 
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[1 .2]. One of these families consists of: - Mammalian SI 9. - Drosophila SI 9. 

- Ascaris lumbricoides SI 9g (ALEP-1 ) and SI 9s. - Yeast YS1 6 (RP55A and RP55B). 

- Aspergillus S16. - Halobacterium marismortui HS12. 

These proteins have 143 to 155 amino acids. 

Awellconservedstretch of 20residues in theC-terminal part of theseproteinshas been selected asa^ 

- Consensus pattern: P-x(6)-[SAN]-x{2)-[LI VMA]-x-R-x-[ALIV]-[LV]-Q-x-L-[EQ] 

[ 1] Etter A., Aboutanos M., Tobler H., Mueller F. 

Proc. Natl. Acad. Sci. U.S.A. 88:1593-1596(1991). 

[ 2] Suzuki K., Olvera J., Wool l,G. Biochimie 72:299-302(1990). 

[1400] 550. Ribosonnal protein S2 signatures 

Ribosomal protein S2 is one of the proteins from the small ribosomal subunlt. S2 belongs to a family of ribosomal 
proteins which, on the basis of sequence similarities [1.2], groups: - Eubacterial S2. - Algal and plant chloroplast S2. 

■ Cyanelle S2. - Archaebacterial S2. 

- Higher eukaryotes P40 (previously thought to be a laminin receptor). 

- Yeast NAB1 . - Plant mitochondrial S2. - Yeast mitochondrial MRP4. 

S2 is a protein of 235 to 394 amino-acid residues. 

Two conserved regions have been selected as signature patterns. One is located in the N-temiinal section and the 
oiner in ine central section. 

- Consensus pattern: [LIVMFA]-x(2)-[LIVMFYC](2)-x-[STAC]-[GSTANQEKRl-rSTALVl- 
tHY]-[LIVMF]-G 11 --vj 

■ ^p^^^"^"^ P^"®'"- P-'^(2)-[LIVMF](2)-[LIVMS]-x-[GDN]-x(3)-(DENL]-x(3)-[LIVM]-x-E-x(4)-[GNQKRHl-[LIVMJ- 

[ 1] Davis S.C., Tzagoloff A., Ellis S.R 
J. Biol. Chem. 267.5508-5514(1992). 

[ 2] Tohgo A.. Takasawa S., Munakata H.. Yonekura H., Hayashi N.. Okamoto H. FEBS Lett. 340:133-138(1994). 
[1401] 551. Ribosomal protein S21 signature 

[1402] Ribosomal protein S21 is one of the proteins from the small ribosomal subunit. So far S21 has only been 

l^elir. htri." 1 °' '° '° ^ N-tem,inal section of the 

protein has been selected as a signature pattern. 

[1403] Consensus pattern: [DE]-x-A-[LIY]-[KR]-R-F-K-[KR].x(3)-[KR] 
[1404] 552. Ribosonnal protein S21 e signature 

- Caenorhabditis elegans S21 (F37C1 2.1 1 ). - Rice S21 [2]. 

- Yeast S21 (Ys25) [3). - Fission yeast S28 [4]. 

These proteins have 82 to 87 amino acids. 

A perfectly conserved nonapeptide in the N-terminal part of these proteins has been selected as a signature pattern. 

- Consensus pattern: L-Y-V-P-R-K-C-S-[SA] 

[ 1] Bhat K.S., Morrison S.G. Nucleic Acids Res. 21:2939-2939(1993). 
[ 2] Nishi R.. Hashimoto H., Uchimiya H., Kato A. 

Bi(xhim^Bio^^^^^ 1216:113.114(1993).[ 3] Suzuki K.. Otaka E. Nucleic Acids Res. 16:6223^223(1988) f 41 
Itoh T. Okata E.. Matsui K.A. Biochemistry 24:7418-7423(1985). o^^^^^i^t^on 4j 

[1405] 553. Ribosomal protein S24e signature 
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A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 

- Vertebrate S24 [1 ]. - Yeast Rp50. - Mucor racemosus S24 [2]. 

s - Halobacterium marismortui HS1 5 [3]. - Methanococcus jannaschii MJ0394. 
These proteins have 101 to 148 amino acids. 

A well conserved stretch in the central part of these proteins has been selected as a signature pattern. 

10 - Consensus pattern: [FYA]-G-x(2)-[KR]-[STA]-x-G-[FY]-[GA]-x-[LIVM]-Y-[DN]-[SDNl 

[ 1] Brown S.J., Jewell A., Maki C.G.. Roufa DJ. Gene 91:293-296(1990). 
[ 2] Sosa L, Fonzi W.A., Sypherd RS. 

IS [1406] Nucleic Acids Res. 17:931 9-9331 (1989).[ 3] Kimura J., Arndt E.. Kimura M. FEBS Lett. 224:65-70(1987). 
[1407] 554. Ribosomal protein S26e signature 

A number of eukaryotic ribosomal proteins can be grouped on the basis of sequence similarities. One of these families 
consists of: - Mammalian S26 [1]. 

20 - Octopus S26 [2]. - Drosophila S26 (DS31 ) [3]. - Plant cytoplasmic S26. 

- Fungi S26 [4]. 

These proteins have 114 to 127 amino acids. 

A conserved octapeptide in the central part of these proteins has been selected as a signature pattern. 

- Consensus pattem: [YHJ-C-V-S-C-A-I-H 

[ 1] Kuwano Y. Nakanlshi O., Nabeshima Y, Tanaka T, Ogata K. J, Biochem. 97:983-992(1 985). [ 2] Zinov'eva R. 
D., Tomarev S.l. Dokl. Akad. Nauk SSSR 304:464-469(1989). 
30 [ 3] Itoh N.. Ohta K., Ohta M., Kawasaki T, Yamashina I. Nucleic Acids Res. 17:2121-2121(1989).[ 4] Wu M , Tan 

H. Gene 150:401-402(1994). 

[1408] 555. Ribosomal protein S28e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
35 One of these families consists of: 

- Mammalian S28 [1]. - Plant S28 [2]. - Fungi S33 [3]. 
Methanococcus jannaschii MJ1202. 

"^0 These proteins have from 64 to 78 amino acids. 

A highly conserved nonapeptide from the C-terminal extremity of these proteins has been selected as a signature 
pattem. 

- Consensus pattem: E-[ST]-E-R-E-A-R-x-L 

45 

[ 1] Chan Y-L, Olvera J., Wool I.G. 

Biochem. Biophys. Res. Commun. 179:314-318(1991). 

[ 2] Hwang I.. Goodman H.M. Plant Physiol. 102:1357-1358(1993). 

[ 3] Hoekstra R., Ferreira RM.. Scotsman T.C., Mager W.H., Planta R.J. Yeast 8:949-959(1992). 

so 

[1409] 556. Ribosomal protein S3Ae signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 

55 - Mammalian S3A (was originally known as v-fos transfomnation effector protein). - Caenorhabdrtis elegans S3A 
(F56F35). 

- Plant cytoptesmk; S3A (CYC07) [1]. - Yeast RpIO (PLC1 and PLC2). 

- Fissbn yeast Rpl 0 (SpACI 3G6.02c). - Methanococcus jannaschii MJ0980. 
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These proteins have from 220 to 250 amino acids. 

A conserved stretch in their N-terminal section was selected as a signature pattem. 

- Consensus pattem: [LIV]-x-[GH]-R-[IV]-x-E-x-[SC]-L-x-D-L 

[ 1] Liu J.H.. Reid DM. 

Plant Physiol. 109:338-338(1995). 

[1410] 557. Ribosomal protein S3 signature 

Ribosomal protein S3 is one of the proteins from the small ribosomal subunit. In Escherichia coli S3 is known to be 
involved in the binding of initiator Met-tRN A. It belongs to a family of ribosomal proteins which, on the basis of sequence 
similarities [1 ]. groups: - Eubacterial S3. 

- Algal and plant chloroplast S3. - Cyanelle S3. - Archaebacterial S3. 

- Plant mitochondrial S3. - Vertebrate S3. - Insect S3. 

- Caenorhabditis elegans S3 (C23G10.3). - Yeast S3 (Rp13). 

S3 is a protein of 209 to 559 amino-acid residues. 

A conserved region located in the C-terminal section has been selected as a signature pattern. 
' x(2)3?(2) '^^^^J'f*^'^l-^(^)-^-^-fL»VMT]-x(2)-[NQSCH]-x(1 ,3)-[UVFCA]-x(3)^^^^ 



[ 1] Otaka E., Hashinnoto T, Mizuta K. 
Protein Seq. Data Anal, 5:285-300(1993). 
25 [1411] 558. Ribosomal protein S4 signature 

Ribosomal protein S4 is one of the proteins from the small ribosomal subunit. In Escherichia coli, S4 is known to bind 
directly to 1 6S nbosomal RN A. Mutations in S4 have been shown to increase translational error frequencies It belongs 
to a family of nbosomal proteins which, on the basis of sequence similarities [1,2]. groups: - Eubacterial S4 - Alaal 
and plant chloroplast S4. 



- Cyanelle S4. - Archaebacterial S4. - Mammalian S9. - Yeast YS11 (SUP45). 

- Marchantia polymorpha mitochondrial S4. - Dictyostelium discoideum rp 1 024. 

- Yeast protein N AM9 [3]. N AM9 has been characterized as a suppressor for ochre mutations in mitochondrial DN A 
It could be a ribosomal protein that acts as a suppressor by decreasing translation accuracy. 

S4 is a protein of 171 to 205 amino-acid residues (except for NAM9 which is much larger). The signature pattem for 
this protein is based on a conserved region located in the central section of these proteins. 

' x?uVmS^"^ 

[ 1] Mizuta K., Hashimoto T, Suzuki K.I., Otaka E. Nucleic Acids Res. 19:2603-2608(1991). 
[ 2] Otaka E., Hashimoto T. Mizuta K. Protein Seq. Data Anal. 5:285-300(1993). 

[ 3] Boguta M., Dmochowska A., Borsuk R, Wrobel K.. Gargouri A.. Lazowska J„ Slonimski R, Szczesniak B 
Kruszewska A. Mol. Cell. Biol. 12:402-412(1992). 

[1412] 559. Ribosomal protein S4e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities 
One of these families consists of: 

- Mammalian S4 [1 ]. Two highly similar isof omis of this protein exist : one coded by a gene on chromosome Y. and 
the other on chromosOTie X 

- Plant cytoplasmic S4 [2] - Yeast S7 (YS6). - Archebacterial S4e. 

55 These proteins have 233 to 264 amino acids. 

A highly conserved stretch of 15 residues in their N-temiinal section has been selected as a signature pattem Four 
positions in this regfon are positively charged residues. 
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- Consensus pattern: H-x-K-R-[LIVMI=]-[SANK]-x-P-x(2)-[VVY]-x-ILIVM]-x-[KRPJ 

[ 1] Fisher E.M.. Beer-Romero R. Brown LG.. Ridley A.. McNeil JA, Lawrence J.B.. Willard H.F. Bieber FR 
Page D C. Cell 63:1205-1218(1990), ' " 

[ 2] Braun H.R, Emmernnann M., Mentzel H., Schmitz U.K. Blochim. Biophys. Acta 1218:435-438(1994). 

[1413] 560. Ribosomal protein S5 signature 

Ribosomal protein S5 is one of the proteins from the smaW ribosomal subunit. In Escherichia coll. S5 is known to be 
important in the assembly and function of the 308 ribosomal subunit. Mutations in S5 have been shown to increase 
translational error frequencies. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities 
[1,2], groups: - Eubacterial S5. 

Cyanelle S5. - Red algal chloroplast S5. - Archaebacterlal S5. 

- Mammalian S2 (LLrep3). - Caenorhabditis elegans S2 (C49H3.11 ). 

^5 . Drosophila S2. - Plant S2. - Yeast S4 (SUP44). - Fungi mitochondrial S5. 

55 is a protein of 1 66 to 254 amino-acid residues. The signature pattern for this protein is based on a conserved region, 
rich in glycine residues, and located in the N-terminal section of these proteins. 

20 - Consensus pattern: G-[KRQ]-x(3)-[FY]-x-[ACV]-x(2)-[LIVMA]-[LIVM]-[AGh[DN]-x(2)-G-x-[LIVMl-G-x-[SAGl-x 
(5.6)-[DEQ]-[LIVMA]-x(2)-A-[LIVMF] ^ 

[ 1] All-Robyn J. A., Brown N.. Otaka E.. Liebman S.W. 

MoL Cell. Biol. 10:6544-6553(1 990). [ 2] Otaka E., Hashimoto T. Mizuta K. Protein Seq. Data Anal. 5:285-300(1993). 
25 [1414] 561. Ribosomal protein S6 signature 

Ribosomal protein S6 is one of the proteins from the small ribosomal subunit. In Escherichia coii, S6 is known to bind 
together with S18 to 16S ribosomal RNA. It belongs to a family of ribosomal proteins which, on the basis of sequence 
similarities, groups: - Eubacterial S6. - Red algal chloroplast S6. 

30 - Cyanelle S6. 

56 is a protein of 95 to 208 amino-acid residues. The signature pattern for this protein is based on a conserved region 
located in the N-terminal section of these proteins. 

3S • Consensus pattern: G-x-[KRC]-[DENQRH]-L-[SA]-Y-x-l-[KRNSA] 
[1415] 562. Ribosomal protein S6e signature 

A number of eukaryotic and archaebacterlal ribosomal proteins can be grouped on the basis of sequence similarities. 
One of these families consists of: 
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• Mammalian S6 [1]. - Drosophila S6 [2]. - Plant S6 [3]. - Yeast S10 (YS4). 

- Halobacterium marisnrrartui HS13 [4]. - Methanococcus jannaschii MJ1260. S6 is the major substrate of protein 
kinases in eukaryotic ribosomes [5]; it may have an important role in controlling cell growth and proliferation through 
the selective translation of particular classes of mRNA. 

These proteins have 1 35 to 249 amino acids. 

A conserved stretch of 12 residues in the N-terminal part of these proteins has been selected as a signature pattern. 

- Consensus pattern: [LIVMHSTAMR]-G-G-x-D-x(2)-G-x-P-M 

[ 1] Franco R., Rosenfeld M.G. J. Biol. Chem. 265:4321-4325(1990). 

[ 2] Watson K.L, Konrad K.D.. Woods D.F., Bryant RJ. Proc. Natl. Acad. Sci. U.S.A. 89:11302-11306(1992). 
[ 3] Hansen G., Estruch J.J., Spena A. Nucleic Acids Res. 20:5230-5230(1 992). 
[ 4] Kimura M., Arndt E,. Hatakeyama T, Hatakeyama T, Kimura J. Can. J. Mbrobiol. 35:195-199(1989). 
[ 5] Bandi H.R., Ferrari S.. Krieg J.. Meyer H.E.. Thonnas G. J. Biol. Chem. 268:4530-4533(1993). 

[1416] 563. Ribosomal protein S7 signature 

Ribosomal protein S7 is one of the proteins from the small ribosomal subunit. In Escherichia coli, S7 is known to bind 
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directly to part of the 3'end of 16S ribosomal RNA. It belongs to a family of ribosomal proteins which, on the basis of 
sequence similarities [1.2,3], groups: - Eubacterial S7. 

- Algal and plant chloroplast S7. - Cyanelle S7. - Archaebacterial S7. 

- Plant mitochondrial S7. - Mammalian S5. - Plant S5. 

- Caenorhabditis elegans S5 (T05E 11.1). 

The best conserved regbn located in the N-terminal section of these proteins has been selected as a signature pattem. 

- Consensus pattem: [DENSK]-x-[LIVMDET]-x(3)-[LI\^FTA](2)-x(6).G-K4KR]-x(5)-[LIVMF]-^^^ 
[STAC] 

[ 1] Klussmann S.. Franke P., Bergmann U., Kostka S., Wittmann-Liebold B. Bid. Chem HoDoe-Sevler 374" 

305-312(1993). ^ ^ 

[ 2] Otaka E., Hashimoto T, Mizuta K. Protein Seq. Data Anal. 5:285-300(1993). 

[ 3] Ignatovich O., Cooper M.. Kulesza H.M.. Beggs J.D. Nucleic Acids Res. 23:4616-4619(1995). 

[1417] 564. Ribosomal protein S7e signature 

[1418] A number of eukaryotic ribosorral proteins can be grouped on the basis of sequence similarities [1] One of 
these families consists of: 



Mammalian S7. 
Xenopus S8. 
Insect S7. 

- Yeast probable ribosomal protein S7 (N2212). 

- Fission yeast probable ribosomal protein S7 (SpAC18G6.13c). 

These proteins have about 200 amino acids. A highly conserved stretch of 14 residues which is located in the central 
section and which is rich in charged residues was selected as a signature pattem. 
[1 41 9] Consensus pattem: [KR]-L-x-R-E-L-E-K-K-F-[SAP]-x-[KR]-H 

[1420] [1] Salazar C.E., Mills-Hamm D.M., Kumar V., Collins F.H. Nucleic Acids Res. 21:4147-4147(1993). 
[1421] 565. Ribosomal protein S8 signature 

Ribosomal protein S8 is one of the proteins from the small ribosomal subunit. In Escherichia coli, SB is known to bind 
directly to 16S ribosomal RNA. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities ' 
[1], groups: - Eubacterial S8. - Algal and plant chloroplast 58. 

- Cyanelle S8. - Archaebacterial S8. - Marchantia polymorpha mitochondrial SB. 

- Mammalian 81 5A. - Plant S15A. - Yeast S22 (S24). 

The best consen/ed region located in the C-terminal section of these proteins has been selected as a signature pattern. 

- Consensus pattem: [GE]-x(2)-[LIV](2)-[STY]-[ST]-x(2)-G-[LIVM](2)-x(4)-[AG]-[KRHAYI] 

[ 1] Otaka E.. Hashimoto T, Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 
[1422] 566. Ribosomal protein S8e signature 

A number of eukaryotic and archaebacterial ribosomal proteins can be grouped on the basis of sequence similarities 
[1]. One of these families consists of: 



- Mammalian S8. - Caenorhabditis elegans SB (F42C5.8). - Leishmania major SB 

- Plant SB. - Yeast S8 (SI 4) (Rpl 9). - Archebacterial SBe. 

These proteins have either about 220 amino acids (in eukaryotes) or about 125 amino acids (in archebacteria) A 
conserved stretch which is located in the N-terminal section and which is rich in positively charged residues has been 
selected as a signature pattem. 

- Consensus pattem: [KR]-x(2)-[STl-G.[GA]-x(5)-[HR].[KG].[KR]-x-K.x-E-[LM]-G 
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[ 1] Engemann S.. Herfurth E.. Briesemeister U., Wittmann-Liebold B. 

J. Protein Chem. 14:189-195(1995). 

[1423] 567. Ribosomal protein S9 signature 

Ribosomal protein S9 is one of the proteins from the small ribosomal subunit. It belongs to a family of ribosomal proteins 
which, on the basis of sequence similarities [1.2], groups: - Eubacterial S9. - Algal chloroplast S9. 

- Cyanelle S9. - Archaebacterial S9. - Mammalian S16. - Plant S16. 
Yeast mitochondrial ribosomal S9. 

A conserved region containing many charged residues and located in the central section of these proteins has been 
selected as a signature pattem. 

- Consensus pattem: G-G-G-x(2)-[GSA]<3-x(2)-[SA]-x(3HGSA]-x-[GSTAV]-[KR]-[GSAL]-[^^ 

[ 1] Chan Y-L., Paz V, Olvera J., Wool I.G. FEBS Lett. 263:85-88(1990). 
[ 2] Otaka E., Hashimoto T. Mizuta K. 
Protein Seq. Data Anal. 5:285-300(1993). 

[1424] 568. Ribulose-phosphate 3-epimerase family signatures 

RibuloseiDhosphate 3-epimerase (EG 5.1.3.1) (also known as pentose-5-phosphate 3-epimerase or PPE) is the en- 
zyme that converts D-rlbulose 5-phosphate into D-xylulose 5-phosphate in Calvin's reductive pentose phosphate cycle. 
In Alcaligenes eutrophus two copies of the gene coding for PPE are known [1 ], one Is chromosomally encoded (cbbEC), 
the other one is on a plasmid (cbbeP). PPE has been found in a wide range of bacteria, archebacteria, fungi and plants! 
The sequence of PPE is highly related to: 

Escherichia coli D-allulose-6-phosphate 3-eplmerase (gene alsE). 
Escherichia coli protein sgcE. 

- Mycoplasma genitalium hypothetical protein MG 1 1 2. 

All these proteins have from 209 to 241 amino acid residues. 

Two consented regions which are located respectively in the N-terminal and in the central part of these proteins have 
been selected as signature patterns. 

- Consensus pattem: [LIVMF]-H-[LIVMFY]-D-[LIVM]-x-D-x(1 ,2)-[FY]-[LIVM]-x-N-x-[STAV] 

- Consensus pattem: [LIVMA]-x-[LIVM]-M-[ST]-[VS]-x-P-x(3)-G-Q-x-F-x(6).[NK]-[LIVMC] 

[ 1] Kusian B., Yoo J.G.. BednarskI R., Bowien B. 
J. Bacteriol. 174:7337-7344(1992). 

[1425] 569. (Ricin B lectin) Similarity to lectin domain of ricin beta-chain, 3 copies. 
[1426] This family consists of a triplicated domain involved in cell agglutination in ricin. 
[1427] 570. (Rotamase) PpiC-type peptidyl-prolyl cis-trans isomerase signature 

Peptidyl-prolyl cis-trans isomerase (EC 5.2.1.8) (PPIase or rotamase) is an enzyme that accelerates protein folding 
by catalyzing the cis-trans isomerizatbn of proline imidic peptide bonds in oligopeptides [1 ]. Most characterized PPiases 
belong to two families, the cyclophilin-type (see <PDOC001 54>) and the the FKBP-type (see <PCXXJ00426>). Recently 
a third family has been discovered [2,3). So far. the only biochemically characterized member of this family is the 
Escherichia coli protein parvulin (gene ppiC). a small (92 residues) cytoplasmic enzyme that prefers amino acid resi- 
dues with hydrophobe side chains like leucine and phenylalanine in the PI position of the peptides substrates. PpiC 
is evolutionary related to a number of proteins that are also probably PPiases: 

- Escherichia coli and Haemophilus influenzae ppiD. PpiD is a PPIase which contains a periplasmic ppiC-like domain 
anchored to the inner membrane and which seems to be involved in the fokJing of outer membrane proteins. 

- Escherichia coli surA. SurA is a periplasmic protein that contains two ppiC-like domains. 

- Nitrogen-assimilating bacteria protein nif M whrch is involved in the activation and stabilization of the iron-compo- 
nent (nifH) of nitrogenase. 

- Bacillus subtilis protein prsA. a membrane-bound lipoprotein involved in protein export. 

- Lactococcus and lactobacillus protease maturation protein prtM, a membrane-bound lipoprotein involved in the 
maturation of a secreted serine proteinase. - Yeast protein ESS1/PTF1 (processing/termination factor 1). 

- Drosophila protein dock) (gene dod). - Mammalian protein PIN1 , 
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- Campylobacter jejuni cell binding factor 2 (CBF2). a secreted antigen. 
Bacillus subtilis hypothetical protein yacD. 

- Helicobacter pylori hypothetical protein HP01 75. 
A hypothetical slime mold protein. 

A conserved region that contains a serine which could play a role in the catalytic mechanism of these enzymes has 
been selected as a signature pattern. 

- Consensus pattern: F4GSADEIhx-[LVAQ]-A.x(3HST)-x(3.4)-[STQJ.x(3.^ 

[1] Fischer G., Schmid FX. 
Biochemistry 29:2205-2212(1990). 

[ 2] Rudd K.E., Sofia H.J„ Koonin E.V.. Plunkett G. Ill, Lazar S., 
Rou viere R E . Trends Biochem. Sci. 20: 1 4- 1 5( 1 995). 
[ 3] Rahfeld J.-U.. Ruecknagel K.R. Schelbert B., Ludwig B.. Hacker J., 
Mann K., Fischer G. FEBS Lett. 352:180-184(1994). 

[1428] 571 . (RmaAD) Ribosomal RNA adenine dimethylases signature 

A number of enzymes responsible for the dimethylation of adenosines if ribosomal RNAs (EC 2 1 1 48) have been 
found [1 ,2] to be evolutionary related. These enzymes are: 

- Bacterial 1 6S rRN A dimethylase (gene ksgA), which acts in the biogenesis of ribosomes by cata^zing the dimeth- 
ylation of two adjacent adenosines in the loop of a conserved hairpin near the 3'^nd of 16S rRNA. Inactivation of 
ksgA leads to resistance to the aminoglycoside antibiotic kasugamycin. 

- Yeast 1 8S rRNA dimethylase (gene DIM1 ). which is functionally similar to ksgA and that dimethylates twin ade- 
nosines in the 3'-end of 1 8S rRNA. 

- Bacterial 'erm' methylases. These enzymes confer resistance to macrolide-lincosamide-streptogramin B (MLS) 
antibiotics - such as erythromycin - by dimethylating the adenine residue at position 2058 of 23S rRN A thus resulting 
in a reduced affinity between ribosomes and the MLS antibiotics. 

Caenorhabditis elegans hypothetical protein E02H1 . 1 . 

The best conserved regions in these enzymes is located in the N-temiinal section and corresponds to a region that is 
probably involved in S-adenosyl methionine (SAM) binding. 

[o l/\oVJ-[U VIvlrYHCJ-E-X-D 

[ 1] van Gemen B.. van Knippenberg RH. (In) Nucleic acid methylatlcjn, Clawson G.A , Willis D B Weissbach A. 
Jones PA, Eds., pp.19-36, Alan R. Liss Inc, New-York, (1990). 

[ 2] Lafontaine D., Delcour J., Glasser A.L.. Desgres J., Vandenhaute J. J. Mol. Biol. 241:492-497(1994). 

[1429] 572. (RuBisCO small) Ribulose bisphosphate carboxylase, small chain. 206 members 
[1430] 573. ATP/GTP-binding site motif A (P-loop) (ras) 

From sequence comparisons and crystallographic data analysis it has been shown [1 ,2,3,4.5,6] that an appreciable 
proportion of proteins that bind ATP or GTP share a number of more or less conserved sequence motifs The best 
conserved of these motifs is a glycine-rich region, which typically fomis a flexible loop between a beta^trand and an 
alpha-hehx. This loop interacts with one of the phosphate groups of the nucleotide. This sequence motif is generally 
u^u sequence [1] or the 'P-loop- [5]. There are numerous ATP- or GTP-binding proteins in 

which the P-loop IS found. A number of protein families for which the relevance of the presence of such a motif has 
been noted are listed below: - ATP synthase alpha and beta subunits. - Myosin heavy chains. - Kinesin heavy chains 
and kinesin-like proteins. - Dynamins and dynamin-like proteins - Guanylate kinase - Thymidine kinase (- Thymidylate 
■ Nrtfogenase iron protein family (nifH/fixC) - ATP-binding proteins involved in "active trans- 
port (/^C transporters) [71 - DNA and RNA helicases [8.9.10]. - GTP-binding elongation factors (EF-Tu. EF-1 alpha 
^ ' T '^- ' '^"""y °' GTP-binding proteins (Ras, Rho. Rab. Ral. Yptl. SEC4. etc.). - Nuclear protein ran" 
- ADP-rtoosylatfon factors family - Bacterial dnaA protein - Bacterial recA protein - Bacterial recF protein - Guanine 
nucleotide-binding proteins alpha subunits (Gi. Gs. Gt. GO. etc.). - DNA mismatch repair proteins mutS family - Bacterial 
type II secretion system protein E. Not all ATP- or GTP-binding proteins are picked-up by this motif. A number of 
proteins escape detection because the structure of their ATP-binding site is completely different from that of the P- 
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loop. Examples of such proteins are the E1 -E2 ATPases or the glycolytic kinases. In other ATP- or GTP-binding proteins 
the flexible loop exists in a slightly different form; this is the case for tubulins or protein kinases, A special mention must 
be reserved foradenylate kinase, in which there is a single deviation from the P-loop pattern: in the last position Gly is 
found instead of Ser or Thn 
Consensus pattern: [AG]-x{4)-G-K-[ST] 

In addition to the proteins listed above, the 'A* motif is also found in a number of other proteins. Most of these proteins 
probably bind a nucleotide, but others are definitively not ATP- or GTP-binding (as for example chymotrypsin, or human 
ferritin light chain). 

[ 1] Walker J.E.. Saraste M., Runswick M.J.. Gay NJ. EMBO J. 1:945-951 (1982).[ 2] Moller W.. Amons R FEBS Lett. 
186:1 -7(1 985).[ 3] Fry D.C.. Kuby SA, Mildvan A.S. Proc. NatL Acad. Sci. U.S.A. 83:907-911 (1986). [ 4] Dever TE 
Glynias M.J.. Merrick W.C. Proc. Natl. Acad. Sci. U.S.A. 84:1 81 4-181 8(1 987).[ 5] Saraste M., Sibbald RR, Wittinghofer 
A. Trends Biochem, Sci. 1 5:430-434(1 990).[ 6] Koonin E.V. J. Mol. Biol. 229: 11 65-11 74(1 993).[ 7] Higgins C.R. Hyde 
S.G.. Mimmack M.M.. Gileadi U.. Gill D.R.. Gallagher M.R J. Bioenerg. Biomembr 22:571-592(1990) [8] Hodgman T 
C. Nature 333:22-23(1988) and Nature 333:578-578(1988) (En-ata).! 9] Linder R. Lasko R, Ashbumer M., Leroy R 
Nielsen RJ., NIshi K., Schnier J., Slonimski RR Nature 337: 121 -122(1 989). [10] Gorbalenya A.E., Koonin E.V.! 
Donchenko A.R, Blinov V.M. Nucleic Acids Res. 17:4713-4730(1989). 
[1431] GTP-binding nuclear protein ran signature (ras) 

Ran (or TC4) is a small abundant nuclear protein that binds and hydrolyzes GTP and which has been implicated in a 
large number of processes including nucleocytoplasmic transport, RNA synthesis, processing and export and cell cycle 
checkpoint control [1 ,2]. Ran is generally included in the RAS 'superfamity' of small GTP-binding proteins [3], but it is 
only slightly related to the other RAS proteins. It also differs from RAS proteins in that it lacks cysteine residues at its 
C- terminal and is therefore not subject to prenylation. Instead ran has an acidic C-terminus. It is. however similar to 
RAS family members in requiring a specific guanine nucleotide exchange factor (GEF) and a specific GTPase activating 
protein (GAP) as stimulators of overall GTPase activity. The region of the GTP-binding B motif which, in ran, is perfectly 
conserved has been selected as a signature pattern. Consensus pattern: D-T-A-G-Q-E-K-[LF]-G-G-L-R-[DE]-G-Y-Y- 
Proteins belonging to this family also contain a copy of the ATP/GTP- binding motif 'A' (P-loop). 
1 1] ScheffzekK., KlebeC, Fritz-Wolf K., Kabsch W„ Wittinghofer A. Nature 374:378-381 (1995).[ 2] Rush M G Drivas 
G.. d'Eustachio R BioEssays 18:103-112(1996).[ 3] Valencia A., Chardin R, Wittinghofer A., Sander C Biochemistrv 
30:4637-4648(1991). ^ 
[1432] 574. recA signature 

The bacterial recA protein I1,2.3.E1] is essential for homologous recombination and recombinational repair of DNA 
damage. RecA has many activities: it filaments, it binds to single- and double-stranded DNA, itbinds and hydrolyzes 
ATR it is also a recombinase and. finally, it interacts with lexA causing its activation and leading to its autocatalytic 
cleavage. RecA is a protein of about 350 amino-acid residues. Its sequence is very well conserved [3,4.5£1] among 
eubacterial species. It is also found in the chloroplast of plants [6]. The best conserved region, a nonapeptide located 
in the middle of the sequence which is part of the monomer-monomer interface in a recA filament has been selected 
as a signature pattern,. 

Consensus pattern: A-L-[KR]-[IF]-[FY]-[STA]-[STAD]-[LIVMQ]-R- 

[ 1] Smith K.C., Wang T-C. V. BioEssays 10:12-16(1989).[ 2] Lloyd A.T. Sharp RM. J. Mol. Evol. 37:399-407(1993) 
[ 3] Roca A.I., Cox M.M. Prog. Nucleic Acids Res. Mol. Biol. 56:1 29-223(1 997).[ 4] Karlin S., Weinstock G M Brendel 
V. J. Bacterid. 177:6881 -6893(1 995). [ 5] Eisen J.A. J. Mol. Evol. 41 :1 105-11 23(1 995).[ 6] Cerutti H.D., Osman M 
Grandoni R. Jagendorf A.T Proc. Natl. Acad. Sci. U.S.A. 89:8068-8072(1 992).fE11 http://www.tiQr.orQ/^ieisen/RecA/ 
RecA. htm I 

[1433] 575. Response regulator receiver domain 

This domain receives the signal from the sensor partner inComment: bacterial twoK:omponent systems. It is usually 

found N-terminalComment: to a DNA binding effector domain. 

[1] PaoGM, Saier MH; J Mol Evol 1995;40:136-154. 

[1434] 576. Ribonucleotide reductase large subunit signature 

•Ribonucleotide reductase (EC 1.17.4.1) [1.2] catalyzes the reductive synthesis of deoxyribonucleotides from their 
corresponding ribonucleotides. It provides the precursors necessary for DNA synthesis. Ribonucleotide reductase is 
an ohgomeric enzyme composed of a large subunit (700 to 1000 residues) and a small subunit (3CX) to 400 residues). 
There are regions of similarities in the sequence of the large chain from prokaryotes. eukaryotes and viruses. One of 
these regions has been selected as a signature pattern. 

[1435] Consensus pattem: W-x(2)-[LF].x(6.7)-G-[LIVMl-[FYRA]-[NH]-x(3)-[STAQLIVM]-[ASC]-x(2)-[PA]- 

[ 1] Nillson O,. Lundqvist T. Hahne S., Sjoberg B.-M. Biochem. Soc. Trans. 16:91 -94(1 988).r 2] Reichard P Science 

260:1773-1777(1993). x /i J 

[1436] 577. Ribonuclease T2 family histidine active sites 

The fungal ribonucleases T2 from Aspergillus oryzae, M from Aspergillus saitoiand Rh from Rhizopeus niveus are 



214 



EP 1 033 405 A2 



structurally and functionally related 30 Kdglycoprotelns [1] that cleave the 3'-5' intemucleotlde linkage of RNA via a 
nucleotide 2*,3'-cyclic phosphate intermediates (EC 3.1.27.1) .A number of other RNAses have been found to be evo- 
lutionary related to these fungal enzymes: - Self-incompatibility [2] in flowering plants is often controlled by a single 
gene (S-gene) that has several alleles. This gene prevents fertilization by self-pollen or by pollen bearing either of the 
two S- alleles expressed in the style. The self -incompatibility gVcoprotein from several higher plants of the solanaceae 
family has been shown [2.3] to be a ribonuclease. - Phosphate-starvation induced RNAses LE and LX from tomato 
[4]. These two enzymes are probably involved in a phosphate-starvation rescue system. - Escherichia coli periplasmic 
RNAse I (EC 3. 1.27.6 ) (gene ma) [5]. - Aeromonas hydrophila periplasmic RNAse. - Haemophilus influenzae hypo- 
thetical protein HI0526.Two histidines residues have been shown [6.7] to be involved in the catalytic mechanism of 
RNase T2 and Rh. These residues and the region around them arehighly conserved in all the sequence described 
above. Two signature pattems have been developed, one for each of the two active-site histidines. The second pattem 
also contains a cysteine which is known to be involved in a disulfide bond. 
Consensus pattem: [FYWL]-x-[LIVM]-H-G-L-W-P [H is an active site residue] 

Consensus pattem: (LIVMF]-x(2)-[HDGTYl-[EQl-[FYW]-x-(KRl-H-G-x-C [H is an active site residue] [C is involved in 
a disulfide bond] 

( 1] Watanabe H., Naitoh A., Suyama Y, Inokuchi N., Shimada H., Koyama T, Ohgi K.. Irie M. J. Biochem 108 303-310 
(1990).l 2] Haring V., Gray J.E., McClure B.A.. Anderson M.A.. Clarke A.E. Science 250:937-941 (1 990).[ 3] IWIcClure 
B.A., Hanng V, Ebert P.R., Anderson I^.A., Simpson R.J., Sakiyama R, Clarke A.E. Nature 342:95957(1989) [4] Lo- 
effler A., Glund K.. Irie M. Eur J. Biochem. 214:627-633(1 993).[ 5] Meador J. III. Kennell D. Gene 95-1-7(1990) [6] 
Kawata Y. Sakiyama F.. Hayashi F, Kyogoku Y Eur. J. Biochem. 1 87:255-262(1 990).( 7] Kurihara H IVIitsui Y Ohqi 
K.. Irie M., Mizuno H.. Nakamura K.T FEBS Lett. 306:189-192(1992). 

[1437] 578. Ribonucleotide reductase large subunit signature. Ribonucleotide reductase (EC 1.17.4.1 ) [1 ,2] catalyz- 
es the reductive synthesis of deoxyribonucleotides from their corresponding ribonucleotides. It provides the precursors 
necessary for DNA synthesis. Ribonucleotide reductase is an oligomeric enzyme composed of a large subunit (700 to 
1000 residues) and a small subunit (300 to 400 residues). There are regions of similarities in the sequence of the large 
chain from prokaryotes, eukaiyotes and viruses. One of these regions has been developed as a signature pattern 
[1438] Consensus pattem: W-x(2)-[LF]-x(6.7)-G-[LIVM]-[FYRA]-[NH]-x(3)-[STAQLIVM]-[ASC]-x(2)-[PA]- 
[1439] [ 1] Nillson O., Lundqvist T, Hahne S.. Sjoberg B.-M. Biochem. Soc. Trans. 16:91-94(1988) [ 21 Reichard P 
Science 260: 1 773-1 777(1 993). 
[1440] 579. RNase H 

RNase H digests the RNA strand of an RNA/DNA hybrid. Important enzyme in retroviral replication cycle, and often 
found as a domain associated with reverse transcriptases. Structure is a mixed alpha+beta fold with three aAj/a layers 
[1441] 58C. Eukaryotic putative RNA-bindIng region RNP-1 signature (rrm) 

Many eukaryotic proteins that are known or supposed to bind single-strandedRNA contain one or more copies of a 
putative RNA-binding domain of about 90amino acids [1,2]. This regbn has been found in the following proteins- " 
Heterogeneous nuclear ribonucleoproteins ** - hnRNP A1 (helix destabilizing protein) (twice) - hnRNP A2JB^ (twice) 

- hnRNP C (C1/C2) (once). - hnRNP E (UP2) (at least once). - hnRNP G (once). " Small nuclear ribonucleoproteins 
" - U1 snRNP 70 Kd (once). - U1 snRNP A (once). - U2 snRNP B" (once). ** Pre-RNA and mRNA associated proteins 

- Protein synthesis initiation factor 4B (elF-4B) [3], a protein essential for the binding of mRNA to ribosomes (once) 

- Nucleolin (4 times). - Yeast single-stranded nucleic acid-binding protein (gene SSB1) (once). - Yeast protein NSR1 
(twice). NSR1 is involved in pre-rRNA processing; it specifically binds nuclear localization sequences - Poly(A) binding 
protein (PABP) (4 times). ** Qhers ** - Drosophila sex determination protein Sex-lethal (Sxl) (twice). - Drosophila sex 
determination protein Transformer-2 (Tra-2) (once). - Drosophila 'elaV protein (3 times), which is probably involved in 
the RNA metabolism of neurons. - Human paraneoplastic encephalomyelitis antigen HuD (3 times) [4]. which is highly 
similar to elav and which may play a role in neuron-specific RNA processing. - Drosophila 'bicoid' protein (once) [5] a 
segment-polarity homeobox protein that may also bind to specific mRNAs. - La antigen (once), a protein which may 
play a role in the transcription of RNA polymerase III. - The 60 Kd Ro protein (once), a putative RNP complex protein 

- A maize protein induced by abscisic ackJ in response to water stress, which seems to be a RNA-binding protein - 
Three tobacco proteins, located in the chloroplast [6], which may be involved in splicing and/or processing of chloroplast 
RNAs (twice). - XI 6 [7], a mammalian protein which may be involved in RNA processing in relation with cellular pro- 
liferation and/or maturation. - Insulin-induced growth response protein CI-4 from rat (twice). - Nucleolysins TIA-1 and 
TIAR (3 times) [8] which possesses nucleolytic activity against cytotoxk: lymphocyte target cells, may be involved in 
apoptosis. - Yeast RNA1 5 protein, which plays a role in mRNA stability and/or poly-(A) tail length [9].lnside the putative 
RNA-bindmg domain there are two regions which are highly consented. The first one is a hydrophobic segment of six 
residues (which is called the RNP-2 motif), the second one is an octapeptide motif (which is called RNP-1 or RNP- 
CS). The position of both motifs in the domain is shown in the following schematic representatbn: 

[1442] xxxxxxx######xxxxxxxxxxxxxxxxxxxxxxxxxxxxx########xxxxxxxxxxxxxxxxxxxxxxxx X RNP-2 RNP-1 
The RNP-1 motif has been used as a signature pattem for this type of domain. 
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Consensus pattern: (RK]-G-{EDRKHPCG}-[AGSCI]-(FY]-[UVA]-x-[FYLM] In most cases the residue in position 3 of 
the pattern is either Tyr or Phe. 

[ 1] Bandzlulis R.J., Swanson iVI.S.. Dreyfuss G. Genes Dev. 3:431 -437(1 989).[ 2] Dreyfuss G., Swanson M S Pinoi- 
Roma S. Trends Biochem. Sci. 1 3:86-91 (1988).[ 3] Milbum S.C., Hershey J.W.B.. Davies M.V.. Kelleher K Kaufman 
R.J. EMBO J. 9:2783-2790(1990).[ 4] Szabo A., Dalmau J., Manley G.. Rosenfeld M., Wong E.. Henson J Posner J 
B.. Fumeaux H.M. Cell 67:325-333(1 991 Vf 5) Rebagliati M. Cell 58:231 -232(1 9B9V f61 Li Y.. Sugiura M EMBO J 9- 
3059-3066(1 990).[ 7) Ayane M.. Preuss U., Koehler G.. IMIelsen PJ. Nucleic Acids Res 19-1273-1278(1991) [81 
Kawakami A.. Tian Q., Duan X.. Streull M.. Schlossman S.F. Anderson P Proc. Natl. Acad. Sci. U.S.A. 89 8681 -8685 
(1992).[ 9] Minvielle-Sebastia L., WInsor B.. Bonneaud N., Ucroute F Mol. Cell. Biol 11-3075-3087(1991) 
[1443] 581. Rubredoxin signature 

[1444] Rubredoxins [1] are small electron-transfer prokaryotic proteins. They contain an iron atom which is ligated 
by four cysteine residues. Rubredoxins are. in some cases, functionally interchangeable with f erredoxins 
[144S] A consen/ed region that includes two of the cysteine residues that bind the iron atc»n has been selected as 
a pattern for these proteins. 

[1446] Consensus pattern: [LIVM]-x(3)-W-x-C-P-x-C-[AGD] [The two C's bind the iron atom] 

In Pseudomonas oleovorans rubredoxin 2 (gene alkG) (2J. this pattern is found twice because alkG has two rubredoxin 

domains. 

Rubrerythrin [3], a protein with inorganic pyrophosphatase activity from Desulfovibrio vulgaris possesses a C-terminal 

rubredoxin -like domain, but this domain is too divergent to be detected by the above pattern. 

[1447] [ 1] Berg J.M.. Holm R.H.(ln) Iron-sulfur proteins, Spiro TG., Ed.. pp1-66. Wiley, New-York (1982) [ 21 Kok 

M., Oldenhuis R., der Linden IVI.PG., Meulenberg C.H.C., Kingma J.. Witholt B., J. Biol. Chem. 264-5442-5451(1989) 

[3] van Beeumen J.J., van Driessche G.. Liu M.-Y, Le Gall J., J. Biol. Chem. 266:20645-20653(1991). 

[1448] 582. (rvp) Eukaryotic and viral aspartyl proteases active site 

Aspartyl proteases, also known as acid proteases, (EC 3.4.23.-) are a wkJely distributed family of proteolytic enzymes 
[1 ,2,3] known to exist invertebrates, fungi, plants, retroviruses and some plant vimses. Aspartate proteases of eukary- 
otes are monomeric enzymes which consist of two domains. Each domain contains an active site centered on a catalytic 
aspartyl residue. The two domains most probably evolved from the duplication of an ancestral gene encoding a pri- 
mordial domain. Currently known eukaryotic aspartyl proteases are: - Vertebrate gastric pepsins A and C (also known 
as gastncsin). - Vertebrate chymosin (rennin), involved in digestion and used for making cheese. - Vertebrate lysosomal 
cathepsins D (EC and E (EC 3.4.23.34). - Mammalian renin (EC 3.4.23.15) whose function is to generate 

angiotensin I from angiotensinogen in the plasma. - Fungal proteases such as aspergillopepsin A (EC 34 2318 ) 
candidapepsin (EC 3.4.23.24). mucoropepsin (EC 3.4.23.23> (mucor rennin), endothiapepsin (EC 3 4 23 22 ) doIvdo^ 
ropepsin (EC 3.4.23.29 ). and rhizopuspepsin (EC 3.4.23.21 ). - Yeast saccharopepsin (EC a4J3^) (proteinase A) 
(gene PEP4). PEP4 is implicated in posttranslational regulation of vacuolar hydrolases. - Yeast banier pepsin (EC 
M2a35) (gene BAR1 ); a protease that cleaves alpha-factor and thus acts as an antagonist of the mating pheromone 
- Fission yeast sxal which is involved in degrading or processing the mating pheromones. Most retroviruses and some 
ptent viruses, such as badnaviruses. encode for anaspartyl protease which is an homodimer of a chain of about 95 to 
1 25 ammo acids. In most retroviruses, the protease is encoded as a segment of a polyprotein which is cleaved during 
the maturation process of the virus. It is generally part of the pol polyprotein and, more rarely, of the gagpolyprotein 
Consen/ation of the sequence around the two aspartates of eukaryotic aspartyl proteases and around the single active 
site of the viral proteases allows us to develop a single signature pattern for both groups of protease 
Consensus pattern: [LIVMFGACI-[LIVMTADN]-[LIVFSA)-D-[ST]-G-[STAV]-(STAPDENQ]- x-(LIVMFSTNC)-x-[LIVMF- 
GTA] [D IS the active site residue] - [ 1) Foltmann B. Essays Biochem. 1 7:52-84(1 981 ).[ 2] Davies D R Annu Rev. 
Biophys. Chem. 19;189-215(1990).[3]RaoJ.K.M., Erickson J.W.. WlodawerA. Bkx:hemistry 30 4663-4671(1991) f 41 
Rawlings N.D.. Barrett A. J. Meth. Enzymol. 248:105-120(1995). 
[1449] 583. (nrt) Reverse transcriptase (RNA-dependent DNA polymerase) 

A reverse transcriptase gene is usually indicative of a mobile element such as a retrotransposon or retrovims Reverse 
transcnptases occur in a variety of mobile elements, including retrotransposons. retroviruses, group II introns. bacterial 
msDNAs, hepadnavimses, and caulimoviruses. Number of members: 1 233 

[1450] [1] Medline: 91 006031 . Origin and evolutran of retroelements based upon their reverse transcriptase sequenc- 
es. Xkjng Y. Eickbush TH; EMBO J 1 990;9:3353-3362. 
[1451] 584. (S-AdoMet synt) S-adenosylmethionine synthetase signatures 

S-adenosylmethionine synthetase (EC 2.5.1.6) is the enzyme that catalyzes theformation of S-adenosytmethionine 
(AdoMet) from methionine and ATP [1]. AdoMet is an important methyl donor for transmethylatton and Is also the 
propylamino donor in polyamine bbsynthesis. In bacteria there is a single isofomi of AdoMet synthetase (gene metK) 
there are two in budding yeast (genes SAM1 and SAM2) and in nremmals while in plants there is generally a multigene 
famiV.The sequence of AdoMet synthetase is highly conserved throughout isozymes and species Two signature pat- 
terns have been selected for this type of enzyme; the first is a hexapeptide which seems to be involved in ATP-binding 
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the second is an almost pert ectJy conserved glycine-ri(^ nonapeptide. 

Consensus pattern: G-A-G-D-Q-G-x(3)-G-[FYH]-Sequences known to belong to this class detected by the pattern- 
Consensus pattern: G-[GAl-G-[ASC]-F-S-x-K-[DE] 

[ 1] Horikawa S., Sasuga J., Shimizu K., Ozasa H.. Tsukada K. J. Biol. Chem. 265:13683-13686(1990) 
[1452] 585. S1 RNA binding domain 

The SI ctomain occurs in a wide range of RNAComment: associated proteins. It is structurally similarComment: to cold 
shock protein which binds nucleic acids.Comment; The SI domain has an OB-fold structure. 
[1] Bycrott M. Hubbard TJ. Proctor M. Freund SM. Murzin AG; Ceil 1997;88:235-242. 
[1 453] 586. SAICAR synthetase signatures 

Phosphoribosylaminoimidazole-succinocarboxamide synthase (EC 6.3.2.6 ) (SAICARsynthetase) catalyzes the sev- 
enth step in the de novo purine biosynthetic pathway; the ATP-dependent conversion of 5'-phosphoribosyl-5-aminoim- 
idazole-4-carboxylic acid and aspartic acid to SAICAR [1]. In bacteria (gene purC).fungi (gene ADE1) and plants, 
SAICAR synthetase is a monofunctional protein;in higher vertebrates It is the N-terminal domain of a bifunctional en- 
zyme that also catalyze phosphoribosylaminoimidazole carboxylase (AIRC) activity. Two conserved regions in the 
central section of this enzyme have been selected as signature patterns for SAICAR synthetase. 
Consensus pattern: ILIVMF](2)-P-[LIVM]-E-x-[LIVMHLIVMCA]-R-x(3)-[TA]-G-S- 
Consensus pattern: [LIVMHLlVMA]-D-x-K-[LIVMFY]-E-F-G 
[ 1] ZaIkin H., Dixon J.E, Prog. Nucleic Acid Res. Mol. Biol. 42:259-287(1992). 
[1454] 587. (SCP) Extracellular proteins SCP/rpx-1/Ag5/PR-1/Sc7 signatures 

A variety of extracellular proteins from eukaryotes have been found to be evolutionary related: - Rodent sperm-coating 
glycoprotein (SCP), also known as acidic epididymal glycoprotein (AEG) . This protein is thought to be involved in 
sperm nnaturation [1]. It is a protein of about 220 residues and probably contains eight disulfide bonds. - Mammalian 
testis-specific protein Tpx-1 [2]. Tpx-1 is highly related to SCP's. - Mammalian glioma pathogenesis-related protein 
(GhPR). - Lizard helothermine, a toxin that blocks ryanodine receptors. - Venom allergen 5 (Ag5) from vespid wasps 
and venom allergen 3 (Ag3) from fire ants. These proteins are potent allergens and are the main cause of allergic 
reactions to stings from insects of the hymenoptera family [3]. Ag5/3 are proteins of about 200 residues and contain 
four disulfide bonds. - Plant pathogenesis proteins of the PR-1 family [4]. These proteins are synthesized during path- 
ogen infection or other stress-related responses. They are proteins of about 1 30 to 1 40 residues and probably contain 
three disulfide bonds. - Proteins Sc7 and Sc14 from the basidomycete fungus Schizophyllum commune. These extra- 
cellular proteins are bosely associated with fruit body hyphal walls [5]. Sc7/14 are proteins of about 180 residues and 
probably contain two disulfide bonds. - Ancylostoma secreted protein from dog hookworm. - Yeast hypothetical proteins 
YJL078C, YJL079C and YKR01 3w.The exact function of these proteins is not yet known. Two conserved regions located 
in their C-terminal half have been selected as signature patterns. The second signature contains a cysteine which is 
known to be involved in a disulfide bond in Ag5. 
Consensus pattern: [GDERl-H-[FYWH]-T-Q-[LIVM](2)-W-x(2)-[STN] 

Consensus pattern: [LIVMFYH]-[LIVMFY]-x-C-[NQRHS].Y-x-[PARH].x-[GL]-N-[LIVMFYWDN] [C is involved in a di- 
sulfide bond] 

[ 1] Mizuki N.. Kasahara M. Mol. Cell. Endocrinol. 89:25-32(1 992). [ 2] Kasahara M.. Gutknecht J.\ Brew K Spurr N 
Goodfellow RN. Genomics 5:527-534(1 989). [ 3] Lu G., Villalba M., Coscia M.R., Hoffman D.R., King TP j' Immunol' 
150:2823-2830(1993).[4I Dixon D.C.. Cutt J.R., Klessig D.R EMBO J. 10: 131 7-1 324(1 991 ).[ 5] Schuren F.H.J.. As- 
geirsdottir S.A, Kothe E.M., Scheer J.M.J., Wessels J.G.H. J. Gen. Microbiol 139" 2083-2090(1 993) 
[1455] 588. SET domain 

SET domains appear to be protein-protein interactionComment: domains. It has been demonstrated that SET do- 
mainsComment: mediate interactions with a family of proteins thatComment: display similarity with dual-specificity 
phosphatasesComment: (dsPTPases) [2]. 

[1] Tripoulas N, LaJeunesse D. GiWea J. Sheam A; Genetics 1 996; 1 43:91 3-928. [2] Cul X, De Vivo I, Slany R, Miyamoto 
A. Firestein R, Cleary, ML; Nat Genet 1998;18:331-337. 
[1456] 589. Src homology 3 {SH3) domain profile 

The Src homology 3 (SH3) domain is a small protein domain of about 60 amino^acid residues first identified as a 
consented sequence in the nonoatalytic part of several cytoplasmic protein tyrosine kinases (e.g. Src, Abl, Lck) [1]. 
Since then, it has been found in a great variety of other intracellular or membrane-associated proteins I2,3,4.5].The 
SH3 domain has a characteristic fold which consists of five or six beta-strands arranged as two tightly packed anti- 
parallel beta sheets. The linker regions may contain short helices [6].The function of the SH3 domain is not well un- 
derstood. The current opinion is that they mediate assembly of specific protein cc^plexes via binding to proline-rich 
peptides [7].ln general SH3 domains are found as single copies in a given protein, but there is a significant number of 
protein with two SH3 domains and a few with 3 or 4 copies. So far. SH3 domains have been identified in the following 
proteins: - Many vertebrate, invertebrate and retroviral cytoplasmic (non-receptor) protein tyrosine kinases. In particular 
in the Src. Abl, Bkt, Csk and ZAP70 families of kinases. - Mammalian phosphatidylinositol-specific phospholipase C- 
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gamma-1 and -2. - Mammalian phosphatidyl inositol 3-klnase regulatory p85 subunit. - Mammalian Ras GTPase- 
activating protein (GAP). - Adaptor proteins mediating binding of guanine nucleotide exchange factors to growth factor 
receptors: vertebrate GRB2, Caenorhabditis elegans sem-5 and Drosophila DRK, All of which have two SH3 domains. 

- Mamn^lian V^v oncoprotein, a guanine nucleotide exchange factor of the CDC24 family. - Some guanine-nucleotide 
releasing factors of the CDC25 family: yeast CDC25. yeast SCD25. fission yeast ste6, - MAGUK proteins. These 
proteins consist of at least three types of domains: one or more copies of the DHR domain, a SH3 domain and a C- 
terminal guanylate kinase domain. Members of this family are: Drosophila lethal{1)discs large-1 tumor suppressor 
protein (gene Dlg1 ), mammalian tight junction protein ZO-1 , vertebrate erythrocyte membrane protein p55. Caenorhab- 
ditis elegans protein lin-2. rat protein CASK and mammalian synaptic proteins SAP90/PSD-95, CHAPSYN-110/PSD- 
93, SAP97/DLG1 and SAP102. - Miscellanous proteins interacting with vertebrate receptor protein tyrosine kinases: 
mammalian cytoplasmic protein Nek (3 copies), oncoprotein Crk (2 copies). - Chbken Src substrate p80/85 protein 
(cortactin) and the similar human hemopoietic lineage cell specific protein Hs1 . - Mammalian dihydrouridine^ensitive 
L-type cateium channel beta (regulatory) subunit including the related human myasthenic syndrome antigen B (MSYB). 

- Mammalian neutrophil cytosolic activators of NADPH oxidase: p47 (NCF-1), p67 (NCF-2), and a potential homolog 
from Caenorhabditis elegans (B0303.7). NCF-1 and -2 have two copies of the SH3 domain, while B03037 has four. - 
Some myosin heavy chains from amoebae, slime molds and yeast (gene MY03). - Vertebrate and Drosophila spectrin 
and fodrin alpha-chain. - Human amphiphysin. - Yeast actin-binding protein ABP1 . - Yeast actin-binding protein SLA1 
(3 copies). - Yeast protein BEM1 and the fission yeast homolog scd2 (or ral3) (2 copies). - Yeast BEM1 -binding proteins 
BOI2 (BEB1) and BOB1 (BOM ). - Yeast fusion protein FUS1. - Yeast protein RSV167. - Yeast protein SSU81 . - Yeast 
hypothetical proteins YAR014c (1 copy). YFR024c (1 copy), YHL002w (1 copy), YHR016c (1 copy), YJL020C (1 copy), 
YHR114W (2 copies) and the fission yeast homolog SpAC12C2.05c. - Caenorhabditis elegans hypothetical proteins 
F42H10,3. The profile developed to detect SH3 domains is based on a structural alignment consisting of 5 gap-free 
blocks and 4 linker regions totaling 62 match positions. 

[ 1] Mayer B.J., Hamaguchi M., Hanafusa H. Nature 332:272-275(1 988).[ 2] Musacchio A., Gibson T. Lehto V.R, Sar- 
aste M. FEBS Lett. 307:55-61 (1 992).[ 3] Pawson T, Schlessinger J. Curr. Biol. 3434-442(1 993).[ 4] Mayer B. J., Bal- 
timore D. Trends Cell Biol. 38-13(1993).[5] Pawson T Nature 373:573-580(1 995). [6] Kuriyan J., Cowburn D. Cun-. 
Opin. Struct. Biol. 3828-837(1 993).[ 7] Morton C.J.. Campbell I.D. Curr Biol. 4:615-617(1994). 
[1457] 590. Serine hydroxymethy transferase pyridoxal-phosphate attachment site (SHMT) 
Serine hydroxymethy [transferase (EC 2.1.2.1 ) (SHMT) [1] catalyzes the transfer of the hydroxymethyl group of serine 
to tetrahydrofolate to form 5,10-methylenetetrahydrofolate and glycine. In vertebrates, it exists in acytoplasmic and a 
mitochondrial form whereas only one form is found in prokaryotes. Serine hydroxymethyltransferase is a pyridoxal- 
phosphate containing enzyme. The pyridoxal-P group is attached to a lysine residue around which the sequence is 
highly conserved in all forms of the enzyme. 

Consensus pattem: [DEH]-[LIVMFY]-x-[STMV]-[GST]-[ST](2)-H-K-[ST]-[LF]-x-G-[PACHRQHGSA]-[GA] [K is the py- 
ridoxal-P attachment site] 

[ 1] Usha R., Savithri H.S.. Rao N.A. Biochim. Biophys. Acta 1204:75-83(1994). 
[1458] 591. SIS domain 

SIS (Sugar ISomerase) domains are found in many phosphosugar isomerases and phosphosugar binding proteins. 
[1] Teplyakov A, Obmolova G, Badet-Denisot MA. Badet B. Polikarpov 1; Structure 1998;6:1047-1055. 
[1459] 592. (SKI) Shikinnate kinase signature 

Shikimate kinase (EC 2.7.1.71 ) catalyzes the fifth step in the biosynthesis from chorismate of the aromatic amino acids 

(the shikimate pathway) inbacteria (gene aroK or aroL). plants and in fungi (where it is part of a multifunctional enzyme 

whk;h catalyzes five consecutive steps in this pathway).Shikimate kinase is a small protein of about 200 residues. A 

conserved region that contains a run of three glycines has been selected as a signature pattern. 

Consensus pattem: [KR]-x(2)-E-x(3)-[LlVMFl-x(8,12)-[LIVMF](2)-[SA]-x-G(3)- x-[LIVMF]. Proteins belonging to this 

family also contain a copy of the ATP/GTP- binding motif 'A' (P-loop). 

[1 460] 593. SNAP-25 family 

[1461] SNAP-25 (synaptosome-associated protein 25 kDa) proteins are components of SNARE complexes. Mem- 
bers of this family contain a cluster of cysteine residues that can be palmitoylated for membrane attachment [2]. 
[1462] [1]Brennwald P, Kearns B, Champion K. Keranen S, Bankaitis V, Novick P; Cell 1994;79:245-258. [2] Risinger 
C. Bbmqvist AG, Lundell I, Lambertsson A, Nassel D, Pieribone VA, Brodin L, Larhammar D; J Biol Chem 1993268 
24408-24414. 

[1463] 594. SNF2 and others N-terminal domain 

[1464] This domain is found in proteins involved in a variety of processes including transcription regulation (e.g., 
SNF2, STH1, brahma, MOT1), DNA repair (e.g., ERCC6. RAD16, RAD5), DNA recombination (e.g.. RAD54), and 
chromatin unwinding (e.g.. ISWI) as well as a variety of other proteins with little functbnal information (e q lodestar 
ETL1). 

[1465] 595. Staphylococcal nuclease homologues (Snase) 
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Present in all three domains of cellular life. Four copies in the transcriptional coactivator pi 00. These, however, appear 
to lack the active srte residues of Staphylococcal nuclease. Positions 14 (Asp-21). 34 (Arg-35), 39 (Asp-40), 42 (Glu- 
43) andCOTment: 1 1 0 (Arg-87) [SNase numbering in parentheses) are thought to be involved in substrate-bindinq and 
catalysis. 

[1] Ponting CP; Protein Sci 1997;6:459-46a [2] Callebaut I. Momon JP; Biochem J 1997-32V 125-1 32 
[1466] 596. SPRY domainA 

SPRY Domain is named from SPIa and the RYanodine Receptor. [Domain of unknown function. Distant homologues 
are domains in Comment: butyrophilin/marenostrin/pyrin homologues. 
[1] Ponting C. Schultz J. Bork P; Trends Biochem Sci 1997;22:193-194. 
[1467] 597. (SQS PSY) Squalene and phytoene synthases signatures 

Two different polyisoprene synthases have been shown [1 .2.3) to share a number of regions of sequence simllarrties 
- Squalene synthase (EC 2.5.1.21) (farnesyl^iphosphate famesyltransf erase) (SQS). which catalyzes the conversion 
of two molecules of famesyl diphosphate (FPP) into squalene. It is the first committed step in the cholesterol biosynthetic 
pathway The reaction carried out by SQS is catalyzed In two separate steps: the first is a head^o-head condensation 
of the two molecules of FPP to form presqualene diphosphate; this intermediate is then rearranged in a NADP<Je- 
pendent reduction, to fomn squalene. SQS is found in eukaryotes. In yeast it is encoded by the ERG9 gene, in mammals 
by the FDFT1 gene. SQS seems to be membrane-bound. - Phytoene synthase (EC 2.5.1.-) (PSY), which catalyzes 
the conversion of two molecules of geranylgeranyl diphosphate (GGPP) into phytoene. It is the second step in the 
biosynthesis of carotenoids from isopentenyl diphosphate. The reaction carried out by PSY is catalyzed in two separate 
steps: the first is a head-to-head condensation of the two molecules of GGPP to form prephytoene diphosphate- this 
intermediate is then rearranged to form phytoene. PSY Is found in all organisms that synthesize carotenoids: plants 
and photosynthetic bacteria as well as some non- photosynthetic bacteria and fungi. In bacteria PSY is encoded by 
the gene crtB. In plants PSY is localized in the chloroplast. As it can be seen from the description above, both SQS 
and PSY share a number of functional similarities which are also reflected at the level of their primary structure In 
particular three well conserved regions are shared bySQS and PSY; they could be involved in substrate binding and/ 
or the catalytic mechanism. Signature patterns have been developed for the second and third consented regions- they 
are localized in the central part of these enzymes. 

Consensus pattern: Y-[CSAM]-x(2)-[VSGI-A-[GSAHLIVAT]-[IVI-G-x(2)-[LMSC]- x(2)-[LIV] 

Consensus pattem: [LlVM]-G-x(3)-Q.x(2,3)-N-[IF]-x-R-D-[LlVMFY]-x(2)-[DE]- x(4.7)-R-x-[FY]-x-P- 

[ 1] Summers C. Karst F. Charies A.D. Gene 136:185-1 92(1993).[ 2] Robinson G.W., Tsay YH., Kienzle B K Smith- 

Monroy C.A.. Bishop R.W. Mol. Cell. Biol. 13:2706-2727(1 993). [ 3] Roemer S.. Hugueney P. Bouvier F.. Camara B 

Kuntz M. Biochem. Biophys. Res. Commun. 196:1414-1421(1993). 

[1468] 598. SRP54-type proteins GTP-binding domain signature 

The signal recognition particle (SRP) is an oligomeric complex that mediates targeting and insertion of the signal 
sequence of exported proteins into the membrane of the endoplasmic reticulum. SRP consists of a 7S RNA and six 
protein subunits. One of these subunits. the 54 Kd protein (SRP54). is a GTP-blnding protein that interacts with the 
signal sequence when it emerges from the ribosome. The N4erminal 300 residues of SRP54 include the GTP-binding 
site (G-domain) and are evolutionary related to similar domains in other proteins which are listed below [1 ]. - Escherichia 
coli and Bacillus subtilis ffh protein (P48). a protein which seems to be the prokaryotic counterpart of SRP54. Ffh is 
associated with a 4.5S RNA in the prokaryotic SRP complex. - Signal recognition particle receptor alpha subunit (dock- 
ing protein), an integral membrane GTP-blnding protein which ensures, in conjunction with SRP. the correct targeting 
of nascent secretory proteins to the endoplasmic reticulum membrane. The G-domain is located at the C-terminal 
extremity of the protein. - Bacterial ftsY protein, a protein which is believed to play a similar role to that of the docking 
protein in eukaryotes. The G<iomain is located at the C-terminal extremity of the protein. - The pilA protein from Neis- 
seria gonorrhoeae which seems to be the homolog of ftsY - A protein from the archaebacteria Sulfobbus solfataricus 
This protein is also believed to be a docking protein. The G-domain is also at the C- terminus. - Bacterial flagellar 
biosynthesis protein flhF The best conserved regions in those domains are the sequence motifs that are part of the 
GTP-binding site, but as those regions are not specific to these proteins, they were not used as a signature pattem. 
Instead, a conserved region located at the C-terminal end of the domain was selected 

Consensus pattem: P-[UVM]-x-[FYLHLIVMAT]-[GS]-x-[GSHEQl-x(4)-[LIVMF] ( 1] Althoff S., Selinger D.. Wise J A 
Nucleic Acids Res. 22: 1 933-1 947(1 994). 

[1469] 599. (STphosphatase) Serine/threonine specific protein phosphatases signature 

SerineAhreonine specific protein phosphatases (EC 31.3.16) (PP) [1 .2,3] are enzymes that catalyze the removal of a 
phosphate group attached to a serine or evolutionary related. - Protein phosphatase-1 (PPI) is an enzyme of broad 
specificity. It is inhibited by two thermostable proteins, inhibitor-l and -2. In mammals, there are two closely related 
isofomis of PP-I : PP-1 alpha and PP-1 beta, produced by alternative splicing of the same gene. In Emericella nidulans. 
PP-1 (gene bimG) plays an important role in mitosis control by reversing the action of the nimA kinase. In yeast, PP- 
1 (gene SIT4) Is involved in dephosphorylating the large subunit of RNA poly^nerase II. - Protein phosphatas'e-2A 
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(PP2A) IS also an enzyme of broad specificity. PP2A is a trimeric enzyme that consist of a core composed of a catalytic 
subunit assocBted with a 65 Kd regulatory subunit and a third variable subunit. In mammals, there are two closely 
related isoforms of the catalytic subunit of PP2A: PP2A-aipha and PP2A-beta, encoded by separate genes - Protein 
phosphatase-2B (PP2B or cateineurin), a calcium-dependent enzyme whose activity is stimulated by calmodulin It is 
composed of two subunits: the catalytic A-subunit and the calcium-binding B-subunit. The specificity of PP2B is re- 
stnctedJn addition to the above-mentioned enzymes, some additional serlnerthreoninespecific protein phosphatases 
I'f o^.n^i" '^^'^^^"^^ listed below. - Mammalian phosphatase-X (PP-X). and Drosophila phosphatase-V 

(PP-V) which are closely related but yet distinct from PP2A. - Yeast phosphatase PPH3. which is similar to PP2A but 
with different enzymatic properties. - Drosophila phosphatase-Y (PP-Y), and yeast phosphatases Z1 and 22 (genes 
f'PZI and PPZ2) which are closely related but yet distinct from PP1 . - Drosophila retinal degeneration protein C (gene 
rdgC), a calcium-binding phosphatase required to prevent light-induced retinal degeneration. - Phages Lambda and 
Phi-80 ORF-221 which have been shown to have phosphatase activity and are related to mammalian PP's The best 
conserved regions in these proteins is a perfectly conserved pentapeptlde that can be used as a signature pattern 
Consensus pattern: [LIVMJ-R-G-N-H-E- t^»»>n. 
[ 1] Cohen P Annu. Rev. Biochem. 58:453-508(1 989).[ 2] Cohen P. Cohen RTW. J. Biol. Chem. 264-21435-21438 
(19B9).[ 3] Cohen RTW., Brewis N.D.. Hughes V.. Mann D.J. FEBS Lett. 268:355-359(1990). 
[1470] 600. Translation initiation factor SUM signature 

In budding yeast (Saccharomyces cerevisiae). SUM is a translation initiation factor that functions in concert with elF- 
2 and the initiator tRNA-Met in directing the ribosome to the proper start site of translation [1 ]. SUM is a protein of 1 08 
residues. Close homologs of SUM have been found [2] in mammals, insects and plants. SUM is also evolutionary 
related to hypothetical proteins from Escherichia coll (yclH). Haemophilus influenzae (H1 1225) and Methanococcus 
vannielii. A conserved region in the C-terminal section has been selected as a signature pattern 
Consensus pattern: [LIVM]-[EQ]-[LIVM]-Q-G-[DEN]-[KHQ1-[KRV] 

[1] Yoon H.. Donahue TR Mol. Cell. Biol. 1 2:248-260(1 992).[ 2] Fields C.A.. Adams M.D. Biochem. Biophys Res 
Commun. 198:288-291(1994). ' 
[1471] 601 . (S T dehydratase) Serine/threonine dehydratases pyridoxal-phosphate attachment site 
Senne and threonine dehydratases [1.2] are functionally and structurally related pyridoxal-phosphate dependent en- 
zymes: - L-serine dehydratase (EC 4.2.1.13) and D-serine dehydratase (EC 4.2.1.14 ^ catalyze the dehydratation of L- 
serine (respectively D-serine) into ammonia and pyruvate. - Threonine dehydratase (EC 4.21 16 ) (TDH) catalyzes the 
dehydratation of threonine into alpha-ketobutarate and ammonia. In Escherichia coli and other microorganisms two 
classes of TDH are known to exist. One is involved in the biosynthesis of isoleucine. the other in hydroxamino acid 
catabolism Threonine synthase (EC 4.2.99.2) is also a pyridoxal-phosphate enzyme, it catalyzes the transformation 
of homoserine-phosphate into threonine. It has been shown [3] that threonine synthase is distantly related to the serine/ 
threonine dehydratases. In all these enzymes, the pyridoxal-phosphate group is attached to a lysine residue The 
sequence around this residue is sufficiently conserved to allow the derivation of a pattern specific to serineAhreonine 
dehydratases and threonine synthases. 

Consensus pattern: [DESH]-x(4.5)-[STVGl-x-[AS]-[FYI]-K-IDLIFSA]-[RVMF]-[GAJ-[LIVMGA] [The K is the pyridoxal-P 
attachment site] 

[ 1] Ogawa H.. Gomi T, Konishi K., Date T, Naakashima H., Nose K.. Matsuda Y, Peraino C , Pitot H C Fuiioka M 
J. Biol. Chem. 264: 158 18-1 5823(1 989). [ 2] Datta P, Goss TJ., Omnaas J.R., Patil R.V. Proc. Natl Acad "sci U S a' 
84:393-397(1987).[ 3] Parsot C. EMBO J. 5:3013-3019(1986).[ 4] Grabowski R.. Hofmeister A.E.M., Buckel w' Trends 
Biochem. Sci. 18:297-300(1993). 

[1472] Cysteine synthas^cystathionine beta-synthase P-phosphate attachment site 

Cysteine synthase (CSase) is the pyridoxal-phosphate dependent enzyme responsible (1 ] lor the formation of cysteine 
from O-acetyl-senne and hydrogen sulfide with the concomitant release of acetic acid. In bacteria suchas Escherichia 
coll, two fomis of the enzyme are known (genes cysK and cysM).ln plants there are also two forms, one located in the 
cytoplasm and the otherin chtoroplasts.Cystathionine beta-synthase [2] catalyzes the first irreversiblestep in homo- 
cysteine transulfuration; the conjugation of homocysteine andserine fonning cystathionine. Like Csase it is a pyridoxal- 
phosphate dependent enzyme. The two types of enzymes are evolutionary related. The pyridoxal-phosphategroup of 
CSases has been shown to be attached to a lysine residue which is located in the N-terminal sectfon of these enzymes 
the sequence around this residue is highly conserved and can be used as a signature pattern to detect this class of 
enzymes. 

Consensus pattern: K-x-E-x(3)-[PA]-[STAGC]-x-S-[IVAP]-K-x-R-x-[STAG]-x(2)-[LIVMl [The 2nd K is the pyridoxal-P 

attachment site 

[1] Sarto K,. Kurosawa M.. Murakoshi 1. FEBS Lett. 328:111-11 4(1 993).[ 2] Swaroop M,. Bradley K.. Ohura T. Tahara 
T., Roper M.D.. Rosenberg LE., Kraus J.R J. BioL Chem. 267- 11 455-1 146 1(1992) 
[1473] 602. S tocus glycop 

S-locus glycoprotein family. In Brassicaceae, selMncompatlble plants have a self/non-self Comment: recognition sys- 
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m^n.^^lrcT^^''^^ '"""'P'^ ^""'^ ^ ^'"9'° (S). S-locus glycoproteins. Com- 

ment, as well as S-receptor kinases, are in linlage with the S^lleles [IJ.Number ol members- 1 28 

[1] Evolufonary aspects of the S-related genes of the Brassica self-incompatibility system: synonymous and nonsvn- 

2] Polymorphism of the S-locus glycoprotein gene (SLG) and the S-locus related gene (SLR1) in Raphanus sativus 
199T2i 3r4?a'"''' BrassK^aceae. Sakamoto K. Kusaba M. NisNo T M^rGen G^^^^^^^ 

[1474] 603. (sdh cyt) Succinate dehydrogenase cytochrome b subunit signatures 

Succinate dehydrogenase (SDH) is a membrane-bound complex of two main components: a membrane-extrinsic com- 

ra°c;:^L°h";oTBiT'''°''K'^^ 

ESnTrimt ^ ^ ? J ^ ^' " b-556 from bacterial SDH (gene sdhC). - Cytochrome 

b560 from the mammalian mitochondrial SDH complex. - Cytochrome b560 subunit encoded in the m tochondrial oe 

Z'e SdX p^vrT? I'' f - ^^^^"'"^ ^ Veast mftochrndS s£^ cXSc 
(gene SDH3 or CYB3). - Protein cyt-1 from Caenorhabditls.These cytochromes are proteins of about 1 30 residueTfha^ 

comprise threetransmembrane regions. Thereare two conservedhistidines which may beinvolved in bindinqtheheme 
group. Two signature patterns have been developed that include these histidine residues 

Consensus partem: R-P-[LIVMT]-x(3)-[LIVM]-x(6)-[LIVMWPK]-x(4)-S-x(2)-H-R-x-[ST] [H could be a heme liqandl 

r iTyuT W^y'v ^ H-x(3HGAHUVM^^^^^^ ( J be a heme ligaL] 

[U^7q^"604.'s^d ter^^i^^ °" ° ' ^"""^"'^^S^^ - Kl^a^eg B. J. Mol. Biol. 250:484-495(1995). 

lJvTj Nets ES%S9^^^^^^^^^ " ^"^ ^- 

Number of members: 40 

[1476] 605. Protein secE/sec61 -gamma signature 

In bacteria the secE protein plays a role in protein export; it is one of the components - with secY and secA - of the 
preproteintranslocase. Ineukaryotes. the evolutionary rebtedproteinsec61-gammapl^^^ 

hrough the endoplasmk: reticulum; it is part of a trimeric complex that also consist of sec61 -alpha anTb'te r^m 
lu^^^r f ^'.^-f fr " °' "'^'^ '° '° "^''^ a Single tLsmembrane regT™^ 

SresleTt^^I'SrT'' ^S?''';'^ '''^^P^'""' " P°=«^=« ^'^^^ ^-terminal segment^ 

eorosdues that a)nteinstwoaddrt.onaltransmembranedomains).The sequenced 

ly well conserved, however it is possible to derive a signature pattern centered on a conse^ed proline°SiTo 
residues before the beginning of the transmembrane domain 

Y'^'"' ®- °- -^^"'^^ "«P°P°^ T.A. Nature 367:654-657(1 994) 

[1477] 606. 11-S plant seed Storage proteins Signature 

c^rhr.Sc?rH ^ ^'^I^'."' """''P^' ^PP^^^^ '° ^^'^^ ""'°9«" ^^^^^ «he developing plant 
can be class fied. on the basis of their structure, into different families. 11-S are non-gVcosylated proteins v^ichS 
hexameric structures [1.2]. Each of the subunits in the hexamer is itself composed of an^cidic aJa bas c cha" 

""'^ ^ "'^"""^^ "^^'^ ^^^"'^♦"^^ ^f^^vvn in the following representalor 

»oooooo(xxxCxxxxxxxxxxxxxxxxxxxxxxNGxCxxxxxxxxxxxxxxxxxxxxxxx <--.--Acidic-su X _ 

suffirh^H """■V"'";:: About-480-to-500-residues .>.C': conseived cysteine involved'in a di^ 

suffide bond. . pos, ion of the pattern. Proteins that belong to the 11-S family are: pea and broad bean legumins rapl 

aTZf' ^ 7'°" ='"^"^9° the acidic and basic subunits (Asn- 

?°"':'''ZTT-u^^^^ '"^"'^^'^ « disuffide bond 

[ IJHayashiM MonH.. NishimuraM.. AkazawaT.. Hara-Nishimura I. Eur. J. Biochem. 172:627-632(1988) ( 2]Shotwell 
M.A.. AfonsoC. Davies E.. Chesnut R.S., Larkins B.A. Plant Physiol. 87:698-704(1988) '^Ji'notweil 
[1478] 607. 7S seed storage protein 

^ctel a?e S^Zr °' """^ °' "^^'°^P^"^ gymnospemis. The 7S storage 

Number of members: 67 
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[1480] [1 1 The three-dimensional structure of canavalinlrcjm jack bean (Canavaliaensiformis) KoTPNoJD IVIcPher- 
son A; Plant Physiol 1993;101:729-744. • » • 

[1481] 608. Aspartate-semlaldehyde dehydrogenase signature 

Aspartate^emlaldehyde dehydrogenase (ASD) cataVzes the second step in the common biosynthelic pathway leading 
from Asp to diaminopimelate and Lys. to Met, and to Thn the NADP-dependent reductive dephosphorylation of L- 
aspartyl phosphate to L-aspartate-semialdehyde. In bacteria and fungi, ASDis a protein of about 40 Kd (340 to 370 
residues) whose sequence Is not extremely well consen/ed [1]. A conserved cysteine residue has been implrcated as 
important for the catalytic activity [2].The region of conservatten around the active site residue is too small to be used 
as signature pattern. Another more conserved region, located in the last third of the sequence, and which contains 
both a conserved cysteine as well as an histidine has been used instead. 
Consensus pattern: (LIVMHSADN]-x(2)-C-x-R-[LIVM]-x(4)-[GSC]-H-[STA 

[ 1] Baril C, Richaud C, Foumi E., Bananton G., Saint Girons I. J. Gen. Microbiol. 138:47-53(1992) [2] Karsten W E 
ViolaRE.Biochim.Blophys. Acta 1121:234-238(1992). " 
[1482] N-acetyl-gamma-glutamyl-phosphate reductase active site N-acetyl-gamma-glutamyl-phosphate reductase 
(EC 1.2.1.38 ) (AGPR) [1 .2] is the enzyme that catalyzes the third step in the biosynthesis of arginine from glutamate 
the NADP-dependent reduction of N-acetyl-5-glutamyl phosphate into N-acetylglutamate 5-semlaldehyde.ln bacteria 
It IS a monof unctional protein of 35 to 38 Kd (gene argC) while in fungi It is part of a bifunctional mitochondrial enzyme 
(gene ARG5.6, argil orarg-6) which contains a N-terminal acetylglutamate kinase (EC 2.7.2.8 ) domain and a C-terminal 
AGPR domain. In the Escherichia coli enzyme, a cysteine has been shown to be implicated in the catalytic activity, the 
region around this residue is well conserved and can be used as a signature pattern 

Consensus pattern: [LIVM]-[GSA]-x-P-G-C-[FY].[AVP]-T-[GA]-x(3)-[GTAC]-[LIVM]-x-P [C is the active site residue] 

( 1] Ludovice M., Martin J.F., Carrachas R, Liras R J. Bacteriol. 174:4606-461 3(1 992).( 2] Gessert S F Kim J H 

Nargang FE.. Weiss R.L J. Biol. Chem. 269:8189-8203(1994). 

[1483] 609. Sialyllransfenase family. 

Number of members: 18 

[1 484] 61 0. SpoU rRNA Methylase family 

This family of proteins probably use S-AdoMet. Number of members: 58 

[1 ] SpoU protein of Escherichia coli belongs to a new family of putative rRNA methylases. Koonin EV, Rudd KE- Nucleic 
Acids Res 1 993;21 :551 9-551 9. [2] The spoU gene of escherichia coli . the fourth gene of the spoT operon, is essential 
for tRNA (Gm18) 2 * methyltransferase activity Persson BC. Jager G. Gustafsson C; Nuciek: Acids Res 1997-25- 
4093-4097. 

[1485] 61 1 . Stathmin family signatures 

Stathmin [1] (from the Greek 'stathmos'which means relay), is an ubiquitous intracellular protein, present in a variety 
of phosphorylated forms and which serves as a relay for diverse second messenger pathways Its expression and 
phosphorylation are regulated throughout development and in response to extracellular signals regulating cell prolif- 
eration, differentiation and function. Stathmin is a highly conserved protein of 149 amino acid residues Structurally, it 
consists of an N-temninal domain of about 45 residues followed by a 78 residue alpha-helical domain consisting of a 
heptad repeat coiled coil structure and a C-terminal domain of 25 residues. Protein SCG 1 0 is a neuron-specific mem- 
brane-associated protein that accumulates in the growth cones of developing neurons. It is highly similar in its sequence 
to stathmin, but differs in that it contains an additional N-terminal hydrophobic segment of 32 residues which is probably 
responsible for its interaction with membranes. Xenopus protein XB3 is also evolutionary related to stathmin and also 
contains an additional N-terminal hydrophobic domain [2]. A conserved decapeptide which ends with the first three 
residues of the coiled coil danaln and a second pattem that corresponds to part of the central region of the coiled coil 
have been selected as signatures for proteins of the stathmin family 
Consensus pattern: P-[KRQ]-[KR](2)-[DE]-x-S-L-[EG]-E- 
Consensus pattern: A-E-K-R-E-H-E-[KRJ-E- 

2i^i°M20'l 64^9(1993^^ f 2] Maucuer A., Moreau J., Mechali M.. Sobel A. J. Biol. Chem. 

[1486] 612. SUA5/yciO/yrdC family signature. The foltowing uncharacterized proteins have been shown [1] to share 
regions of similarities: - Yeast protein SUA5. - Escherichia coli hypothetical protein ycD and H1 1 1 98, the corresponding 
Haemophilus influenzae protein. - Escherichia coli hypothetical protein yrdC and HI0656, the corresponding Haemo- 
philus influenzae protein. - Bacillus subtilis hypothetical protein ywlC. - Mycobacterium leprae hypothetical protein in 
rfe-hemK intergenic region. - Methanococcus jannaschii hypothetical protein MJ0062.These are proteins of from 20 
to 46 Kd which contain a number of conserved regions in their N-temiinal section. They can be picked up in the database 
by the foltowing pattem. 

[1487] Consensus pattem: [LIVMTA](3)-[LIVMFYC]-[PG]-T-IDE]-[STA]-x-[FY]-[GAl-[LIVM]-[GSl- 
[1488] ( 1] Bairoch A., Rudd K.E., Robison K. Unpublished obsen/ations (1 995). 
[1 489] 61 3. Sucrose synthase 



222 



EP 1 033 405 A2 



Sucrose synthases catalyse the synthesis of sucrose from UDP-glucose and fructose. This family includes the bulk of 
the sucrose synthase protein. However the catboxyl terminal region of the sucrose synthases belongs to the qlvcosvl 
transferase family GIvcos transf 1 . 
[1490] 614. Sulfotransf erase proteins 
Number of members: 59 

[1 491] 615. Synaptophysin / synaptoporin signature 

Synaptophysin and synaptoporin [1] are structurally related proteins, found in the membrane of synaptic vesicles which 
may function as ionic or solute channels. These two glycoproteins seem to span the membrane four times Both their 
N- and C-termini sequences seem to be cytoplasmically located. As a signature pattem for this family of proteins a 
highly consewed region located in the beginning of the first intravesicular loop just after the first transmembrane domain 
has been selected. This region contains a cysteine residue that may be involved in a disulfide bond 
Consensus pattem: L-S-V-[DE]-C-x-N-K-T [C may be involved in a disulfide bond [ 1) Knaus P.. Marqueze-Pouev B 
Scherer H., Betz H. Neuron 5:453-462(1990). " 
[1492] 616. Syndecans signature 

Syndecans [1.2] (from the greek syndein; to bind together) are a family of transmembrane heparan sulfate proteogly- 
cans which are implicated in the binding of extracellular matrix components and growth factors. Syndecans bind a 
vanety of molecules via their heparan sulfate chains and can act as receptors or as co-receptors. Structurally these 
proteins consist of four separate domains: a) A signal sequence; b) An extracellular domain (ectodomain) of variable 
length and whose sequence is not evolutionary conserved in the varrous forms of syndecans. The ectodomain contains 
the Sites of attachment of the heparan sulfate glycosaminoglycan side chains; c) A transmembrane region- d) A highly 
consen/ed cytoplasmic domain of about 30 to 35 residues which could interact with cytoskeletal proteins The proteins 
known to belong to this family are: - Syndecan 1. - Syndecan 2 or fibroglycan. - Syndecan 3 or neuroglycan or N- 
syndecan. - Syndecan 4 or amphiglycan or ryudocan. - Drosophila syndecan. - Caenorhabditis elegans probable syn- 
decan (F57C7.3).The signature pattern that has been developed for syndecans starts with the last residue of the 
transmembrane region and includes the first 10 residues of the cytoplasmic domain. This region, which contains four 
basic residues, could act as a stop transfer site. 
Consensus pattern: [FY]-R-[IM]-[KR]-K(2)-D-E-G-S-Y 

[ 1] Bernfield M.. Kokenyesi R.. Kato M., Hinkes M.T. Spring J.. Gallo R.L., Lose E.J. Annu. Rev Cell Biol 8 365-393 
(1992).[ 2] David G. FASEB J. 7:1023-1030(1993). 
[1493] 617. Syntaxin / epimorphin family signature 

The following proteins have been shown to be evolutionary related [1 ,2,3]: - Epimorphin (or syntaxin 2). a mammalian 
mesenchymal protein which plays an essential role in epithelial morphogenesis. - Syntaxin 1 A (also known as antigen 
HPC-1) and syntaxin IB whfch are synaptic proteins which may be involved in docking of synaptic vesicles at presy- 
naptic active zones. - Syntaxin 3. - Syntaxin 4, which is potentially involved in docking of synaptic vesicles at presynaptic 
active zones. - Syntaxin 5, which mediates endoplasmic reticulum to golgi transport. - Syntaxin 6, which is involved in 
intracellular vesicle trafficking. - Syntaxin 7. - Yeast PEP12 (or VPS6) which is required for the transport of proteases 
to the vacuole. - Yeast SED5 which is required for the fusion of transport vesicles with the Golgi complex. - Yeast SS01 
and SS02 which are required for vesicle fusion with the plasma membrane. - Yeast VAM3. which is required for vacuolar 
assembly. - Arabidopsis thaliana protein KNOLLE which may be involved in cytokinesis. - Caenorhabditis elegans 
hypothetical proteins F35C8.4. F48F7.2, F55A11 .2 and T01 B11 .3The above proteins share the following character- 
istics: a size ranging fromX Kd to 40 Kd; a C-terminal extremity which is highly hydrophobic and isprobably involved 
in anchonng the protein to the membrane; a central, well conserved region, which seems to be in a coiled-coil confor- 
mation. The pattem specific for this family is based on the most consen/ed region of the coiled coil domain 
Consensus pattem: [RQ]-x(3)-[LIVMA]-x(2)-[LIVM]-[ESH]-x(2)-(LIVMT]-x-[DEVM]-[LIVM]-x(2)-rLIVM]-rFSl-x(2)- 
[LIVM]-x(3)-[LI\/T]-x(2)-Q.[GADEQ]-x(2)-[LIVM]-[DNQT]-x-[LI\^F]-[DESV]-x(2)-[LIVM] 

[ 1] Bennett M.K.. Garcia-Arraras J.E., Elferink LA.. Peterson K., Fleming A.M.. Hazuka CD.. Scheller R.H. Cell 74- 
863-873(1 993).[ 2] Spring J.. Kato M., BemfieW M. Trends Biochem. Sci. 18:124-125(1993).] 3] Pelham H.R.B. Cell 

_73^^25^426{ 1 993). 
[1494] 618. Sm protein 

The U1 , U2. U4/U6, and U5 small nuclear ribonucleoprotein particles (snRNPs) Involved in pre-mRNA splicing contain 
seven Sm proteins (B/B'. D1, D2. D3. E, F and G) in common, which assemble around the Sm site present in four of 
the major spliceosomal small nuclear RNAs. These proteins contain a common sequence motif in two segments Smi 
and Sm2, separated by a short variable linker 

[14951 [1] Hermann H. Fabrizio R Raker VA, Foulaki K. Homig H. Brahms H, Luhrmann R EMBO J 1995-14- 
2076-2088. [2] Kambach C, Waike S. Young R. Avis JM. de la Fortelle E. Raker VA, Luhrmann R. Li J. Naoai K' Cell 
1999;96:375-387. . fx. v^eii 

[1496] 619. Skpl family 

[1497] [11 Stebblns CE, Kaelln WG Jr, Pavletich NP; Science 1999;284:455-461. 
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[14981 620. Protein secY signatures 

The eubacterial secY protein [1] plays an important role in protein export. It Interacts with the signal sequences of 
secretory proteins as well as with two other components of the protein translocation system: secA and secE SecY is 
an integral plasma membrane protein of 419 to 492 amino acid residues that apparently contains ten tiansmembrane 
segments. Such a structure probablyconfers to secY a Iranslocator- function, providing a channel for periplasmic and 
outer-membrane precursor proteins.Homologs of secY are found in archaebacteria [2]. SecY is also encoded in the 
chloroplast genome of some algae [3] where it could be involved in a prokaryotic-like protein export system across the 
two membranes of the chloroplast endoplasmic reticulum (CER) which is present in chromophyte andciyptophyte alqae 
Two signature pattems have been developed for secY proteins. The first corresponds to the second transmembrane 
region, which is the most conserved section of these proteins. The second spans the C-terminal part of the fourth 
^ansmembrane region, a short intracellular loop, and the N-terminal part of the fifth transmembrane region 
Consensus pattem: [GSTl-[LIVMF](2)-x-ILIVM]-G4LIVI^/q-x-P-[LIVMFY](2)-x-(AS]-[GSTQ]-[LIVMFAT](3)-a[LIVMFAJ 

Consensus pattern: (LIVMFYW](2)-x-(DEJ-x-[LIVMF]-[STN]-x(2)-G-[LIVMFJ-[GST]-[NSTl-G-x-rGSTl-fLIVMFU3) 
LrFE™298S^^^^^^ 

[1499] 621 . (Seed protein) Small hydrophilic plant seed proteins signature. The following small hydrophilic plant seed 
proteins are stmcturally related: - Arabidopsis thaliana proteins GEA1 and GEA6. - Cotton late embryogenesis abun- 
dant (LEA) protein D-19. - Carrot EMB-1 protein. - Barley LEA proteins B19.1 A. B19.1B. B19.3 and B19 4 - Maize late 
embryogenesis abundant protein Emb564. - Radish late seed maturatbn protein p8B6.-Rice embryonic abundant 
protein Empl.- Sunflower 10 Kd late embryogenesis abundant protein (DS10). - Wheat Em proteins. These proteins 
f °Jf f'"'"^ ^''^ P'^y « ^°K1,2] in equipping the seed for sun^ival. maintaining 

a minimal level of hydration in the dry organism and preventing the denaturation of cytoplasmic components Thev 
may also play a role during imbibition by controlling water uptake. As a signature pattem. the best conserved regton 
inmesequence of theseproteins has been developed, it isag^cine-richnonapeptide located 
[1500] Consensus pattem: G-[EQ]-T-V-V-P-G-G-T- "v.uun. 

[1501] [ 1] Dure L. Ill, Crouch M.. Harada J., Ho T-H. D.. Mundy J.. Quatrano R.. Thomas T. Sung Z R Plant Mol 
BIO 12:475-486(1989).[ 2] Gaubier P. Raynal M., Hull G.. Huestis G.M., Grellet F. Arenas C, Pages M DeSny M 

Mol. Gen. Genet. 238:409-418(1993). ^^t^^ny ivi. 

[1502] 622. Serine carboxypeptidases, active sites 

All known carboxypeptidases are either metallo cartx)xypeptidases or serinecarboxypeptidases. The catalytic activity 
of the serine carboxypeptidases. like that of the trypsin family serine proteases, is provided by a charge relay syster^ 
involving an aspartic acid residue hydrogen-bonded to a histidine. which is itself hydrogen-bonded to a serine 111 
Proteins toiown to be serine carboxypeptidases are: - Barley and wheat serine carboxypeptkJases I II and III 121 - 
Yeas carboxypeptidase Y (YSCY) (gene PRC1). a vacuolar protease involved in degrading small pepiides - Yeast 
KEX protease, involved in killer toxin and alpha-factor precursor processing. - Fission yeast sxa2. a probable carbox- 
ypept^idase involved in degrading or processing mating pheromones [3]. - Penicillium janthinellum carboxypeptidase 
frltp n /CtlL? l?]:f' f *°^P«P«^^« P^P'^" " Aspergullus satoi carboxypeptidase cpdS. - Vertebrate protective 
protein cathepsin A [5], a lysosomal protein which is not only a cart^oxypeptidase but also essential for the actKrity of 
both beta-galactosidase and neuraminidase. - Mosquito vitellogenic carboxypeptidase (VCP) [61 - Naeqieria fow^eri 
virulence-related protein Nf314 [7]. - Yeast hypothetical protein YBR139w - Caenorhabdrtis ele^ns hyp,?thetbS pr^ 

nrtnle yase hydroxynitnle fyase) (HNL) [8], an enzyme involved In plant cyanogenesis. The sequences surrounding 
the active site serine and histidine residues are highly conserved in all these serine cartioxypeptidases 
Consensus pattem: (LIVM]-x.[GTA]-E-S-Y-[AGHGS] [S is the active site residue] 

t^rsftrreSiSuer'"' "■'^^^"''^^^■^'-'^^^'^^"'■"^''^^■'^■f®^°'^°'-]-[SAGV]-lSG]-H-x-[IVAQ]-P-x(3)-(PSA] [H is the ac- 

[1] Liao DX. Remington S.J. J. Biol. Chem. 265:6528-6531 (1990).[ 2] Sorensen S.B.. Svendsen I., Breddam K 
Cartsberg Res. Commun. 54:1 93-202(1 989).[ 3] Imai Y. Yamamoto M. Mol. Cell. BioL 12 1827-1834(1992) [41 Sv- 

H^^Jill^'n^r i!- ^"'"m" o ■ ""^1"^'°" ^^^^ 333:39-43(1993).[ 5J Galjart N.J.. MoLu 

r ■ ^'"^'"^^ N.. Bonten E.J.. d'Azzo A. J. Bfol. Chem. 266:14754-14762(1991 ).[ 61 Cho W L Deitsch 
K.W.. Raikhe, AS. Proc. Natl. Acad. Sci. US A. 88: 10821 -10824(1 991 ).[ 7] Hu W.N.. KopachikW. B^d R N InSt 
Immua 60:2418-2424(1992).f 8] Wtejant H.. Mundry K. W.. Pfitzenmaier K. Plant Mol. Btol. 26735-7^(1 994) f 91 S- 
ings N.D., Barrett A.J. Meth. Enzymol. 244:19-61(1994).[E1) '«>lia»'»;.l »J Mawi 

^TL tS ^^'^^^'a"^ "7" Serpins (SERine Proteinase INhibitors) [1.2.3.4] are a group of stmcturally related 
protems. They are high molecular weight (400 to 500 amino acids).extracellular. irreversible serine protease inhibitors 
wrth a welldefined stmcturaMuncttonal characteristic: a reactive regbn that acts as a 'baif for an appropriJ"e se ine 
protease. This regton is found in the C-temiinal part of these proteins. Proteins which are known to belong to the serpin 
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fami^ are listed below (references are only provided for recently determined sequences): - Alpha-1 protease inhibitor 
(alpha- -antftrypsin. contrapsin). - Alpha-1 -antichymotrypsin, - Antithrombin III. - Alpha-2-antiplasmin - Heparin co- 
T^^,!L' ^'"P'^'"^"' ^' - Plasminogen activator Inhibitors 1 (PAI-1) and 2 (PAI-2). - Glia derived nexin 

(GDN) (Protease nexin I). - Protein C Inhibitor. - Rat hepatocytes SPI-1 . SP|.2 and SPI-3 inhibitors. - Human squamous 
cell carcinoma antigen (SCCA) which may act in the modulation of the host immune response against tumor cells - A 
lepidopteran protease inhibitor. - Leukocyte elastase inhibitor which, in contrast to other serpins. is an intracellular 
protein. - Neuroserpin [5], a neuronal inhibitor of plasminogen activators and plasmin. - Cowpox virus crmA [61 an 
inhibitor of the thiol protease interleukin-1 B converting enzyme (ICE). CmiA is the only serpin known to inhibit a non- 
senne proteinase. - Some orthopoxviruses probable protease inhibitors, which may be involved in the regulation of the 
blood clotting cascade and/or of the complement cascade in the mammalian host. On the basis of strong sequence 
sirnilarities a number of proteins with no known inhibitory activity are said to belong to this family; - Birds ovalbumin 
and the related genes X and Y proteins. - Angiotensinogen; the precursor of the angiotensin active peptide - Barley 
protein Z; the mapr endosperm albumin. - Corticosteroid binding globulin (CBG). - Thyroxine-binding globulin (TBG) 
-Sheep utenne milk protein (UTMP) and pig uteroferrin-associated protein (UFAP). - Hsp47. an endoplasmic reticulurn 
heat-shock protein that binds strongly to collagen and could act as a chaperone in the collagen biosynthetic pathway 
[7]. - Maspin. which seems to function as a tumor supressor [5]. - Pigment epithelium-derived factor precursor (PEDF) 
a protein with a strong neutrophic activity [8]. - Ep45, an estrogen-regulated protein from Xenopus [9] A signature 
pattern has been developed for this family of proteins, centered on a well conserved Pro^'he sequence which is found 
ten to fifteen residues on the C-terminal side of the reactive bond 

Consensus pattern: [LIVMFY]-x-ILIVMFYAC]-[DNQ]-(RKHQS]-[PST]-F-[LIVMFY]-[LIVMFYCl-x-[LIVK/IFAH]- 
[ 1] Carrell R.. Travis J. Trends Biochem. Sci. 10:20-24(1 985). [ 2] Carrell R.. Pemberton PA. Boswell D.R Cold Spring 
Harbor Symp. Quant. Biol. 52:527-535(1 987).[ 3] Huber R.. Carrell R.W. Biochemistry 28:8951 -8966(1 989H 41 
Remold-ODonneel E. FEBS Lett. 315:105-108(1993).[ 5]OsterwalderT. Contartese J.. Stoeckli E T Kuhn TB Son- 
deregger P EMBO J. 15:2944-2953(1996).[6] Komlyama T, Ray C.A. Pickup D.J.. Howard AD..'Thomberry'N A 
Peterson E.P, Salvesen G. J. Biol. Chem. 269:1 9331-1 9337(1 994).[ 7] Clarke E.. Sandwal B D Biochim BioDhvs' 
Acta 1129:246-2^(1992).[ 8] Zou Z., Anisowicz A.. Neveu M., Rafidi K.! Sheng S . Sager R.. HendrirMT SeS^E 
Q ifc ."on" f 3^526-529(1 994).[ 9] Steele FR., Chader G.J., Johnson L.V.. Tombran-Tink J. Proc. Natl Acad' 
SCL U SA 90:1526-1530(1993).[10} Holland L.J.. Suksang C. Wall AA. Roberts L.R.. Moser D.R.. Bhattacharya A 
J. BioL Chem. 267:7053-7059(1992). iiawiaiya«. 
[1505] 624. Sigma-54 interaction domain signatures and profile 

Some bacterial regulatory proteins activate the expresston of genes from promoters recognised by core RNA polymer- 
ase associated with the alternative sigma-54 factor. These have a conserved domain of about 230 resklues involved 
in the ATP-dependent [1 .2] interaction with sigma-54. This domain has been found in the proteins listed below - acoR 
from Alcaligenes eutrophus. an activator of the acetoin catabolism operon acoXABC.-algB from Pseudomonas aeru- 
ginosa, an actuator of alginate biosynthetic gene algD. - dctD from Rhizobium an activator of dctA. the C4-dicarboxylate 
transport protein. - dhaR from Citrobacter freundii. a regulator of the dha operon for glycerol utilizatbn. - fhIA from 
Eschenchia coli. an activator of the fomiate dehydrogenase H and hydrogenase III structural genes. - fIbD from Cau- 

lobactercrescentus. an activator of flagellargenes.-hoxAfrom Alcaligenes eutrophus. an activator of the hydrogenase 
operoa - hrpS f rom Pseudomonas syringae. an activator of hprD as well as other hrp loci involved in plant pathogenicity. 
- hupRi from Rhodobacter capsulatus. an activator of the [NiFe] hydrogenase genes hupSL. - hydG from Escherichia 
coll and Salmonella typhimurium, an activator of the hydrogenase activity. - levR from Bacillus subtilis. which requlates 
the expression of the levanase operon (levDEFG and sacC). - nifA (as well as anfA and vnf A) from various bacteria 
an activator of the nif nitrogen-fixing operon. - ntrC. from various bacteria, an activator of nitrogen assimilatory genes 
such as that for glutamine synthetase (gInA) or of the nif operon. - pgtA from Salmonella typhimurium, the activator of 
he inducible phospho- glycerate transport system. - pilR from Pseudomonas aeruginosa, an activator of pilin gene 
transcnptioa - rocR from Bacillus subtilis. an activator of genes for arginine utilization - tyrR from Escherichia coll 
involved in the transcriptional regulation of aromatic amino-acid biosynthesis and transport. - wtsA from Envinia stew- 
artii, an activator of plant pathogenicity gene wtsB. - xylR from Pseudomonas putida. the activator of the tol plasmkj 
xylene catabolism operon xylCAB and of xylS. - Escherichia coli hypothetical protein yfhA. - Escherichia coli hypothet- 
ical protein yhgB. About half of these proteins (algB, dcdX flbD. hoxA. hupRI. hydG. ntrC. pglA and pilR) betong to 
signal transduction twon^mponent systems [3] and possess a domain that can be phosphorylated by a sensor-kinase 
protein in their N- terminal section. Almost all of these proteins possess a helix4um-helix DNA-binding domain in their 
C-temiinal section. The domain which interacts with the sigma-54 factor has an ATPase activity. This may be required 
to promote a confomrtational change necessary for theinteractbn [4]. The domain contains an atypical ATP-bindinq 
motif A (P-k»p) as well as a form of motif B. The two ATP-binding motifs are located in the N-terminal section of the 
domain; signature pattems have been developed for both motifs. Other regions of the domain are also conserved One 
oi them, located in the C-terminal section, has been selected as a third signature pattern 
Consensus pattern: ILIVMFY](3)-x-G-(DEQ]-(STE]-G-(STAV)-G-K-x(2)-(UVMFY] 



225 



EP 1 033 405 A2 



Consensus pattern: (GSJ-x-[LIVMF]-x(2)-A-[DNEQASH]-(GNEiq-G-[STIMHLIVMFYl(3)-[DEl-rEKH^ 
Consensus pattern: (FYW]-P-[GS]-N-[LIVM]-R-(EQ]-L-x-(NHAT] J lt=^J-lUVMJ 

\9-^Ti!mS!^!l^^^ ^HM M • U^f «J-«'^^<1993) 1 2] Austin S.. Kundrct C. Dixon R. Nucleic Acids Res. 
^^KeS^ ^II^^^^^^ ' • 2^^311-336(,989,.[4] Austin S.. 

[1 506] 625. Sigma-70 factors family signatures 

Sigma factors [1 ] are bacterial transcription initiation factors that promote the attachment of the core RNA polymerase 

Lrn^ Sn? ^^T/ °* ''9^^-'° ^P°°)- as the major or primary 

S?J^'f • T^'^ '^'^ °' "^'''^ transcription of a wide variety of genes. The other s^ma 

factors, known as alternative s.gma factors, are required for the transcription of specie subsets of genes With regard 

IVr^^rmT- TT '° ''k' '"""^'^ ''3'"" ^ "'^^ ^^"■^'y °* ^'S"'^ 'a'^^'^- °f Which are Led 
below^- Bacllus Sigma factors involved in the control of sporulation^pecific genes: sIgma-E (sigE or spol IGB). sigrn^ 

EJLSS r '"h ™ f 1'"''""^ ''^"'^■'^ °^ ^'P") '"^"'^^'^ «>'P^«««i°n o' Shock genes - 

Eschenchia coll and related bacteria sigma-27 (gene fllA) Involved in the expression of the flagellln gene - Escherichia 

external stresses. - Myxococcus xanthus sIgma-B (sigB) which is essential for the late-stage differentiatkjn of that 
bac eria. Alignments of the sigma-70 family pem,it the identification of four regions of high coLn^ati^ STeach o 

Z rZ^ T'T""" II ''«T.^«^«'°P«'^ ■T'^^ «^«« Pa««'" corresponds to sub-region 2.2;the exact function of this 
ThSL^nr^ °T ' '"^"■""'^ "'"^'"9 °' ^•^^ ^a'^*^^ '° RNA polymerase 

in binding the conserved -SSregion of promoters recognized by the major sigma factors. The second pattern starts one 
residue before the N-terminal extremity of the HTH regton and ends six residues after Its C-termlnal extremt 
Consensus pattern: [DE]-[LIVMF](2)-[HEQS]-x-G-x-ILIVMFA]-G-L-[LIVMFYE]-x-[GSAf^l-[LIVMAPl 
Wv7fW]^(2HLI^ 

( 1] Helmann J D.. Chamberlln M.J. Annu. Rev Blochem. 57:839-872(1 988).[ 2] Gribskov M.. Burgess R R Nucleic 

r „ mJ?''*^'®'^^^''^^ f ^- Gross C.A. J Bacteriol. i;4:38533^9(1992 W 

LonettoH^.A., Brown K.L, Rudd K.E., Buttner IVI.J. Proc. Natl. Acad. Sci. U.S.A. 91:7573-7577(l£4) ^^^^^^^^ '^^ 
[1507] 626. Signal carboxyl-terminal domain. 430 members. 
[1508] 627. Signal peptidases I signatures 

fnS'r!i!?,7h' ^^T^'^ PB^ases) remove the signal peptides from secreto^r proteins 

miST ; H °' ^^"'"^ ' '«P^) ^''^'^ responsible for the processing of the 

ZTl P^^.-P^°»^i"«: »yP« I' Oene Isp) which only process lipoproteins, and a third type invoN,^ In the 

^r^R 'cMhS' TTr ^c''' ' cytoplasmic membrane by 
one (^ B. subtilis) or two (in E. coli) N-termlnal transmembrane domains with the main part of the protein protuding iJ 
the periptesmic space Two residues have been shown [2.3] to be essential for the cata^tic activrty of SPase I: a seJne 
^nT. i£ H^Mool' '° y«a«t mitochondrial inner membrane protease subunrt 1 anTI 

iZon.Hl, . ' Which catalyze the removal of signal peptides required for the targeting of proteins from the 
mitochondrial matrix, across the inner membrane, into the inter-membrane space [4]. In euka^otes the removal oi 

^ f (SPC21) subunrts as well as the yeast SEC11 subunit have been shown [5] to share 
reg ons of sequence similarrty with prokaryotic SPases I and yeast IIWIP1/IMP2. Three signature patterns for these 

f f '^'^'^^ ^''^ ^"""""^^ ••^^ ^"^ «'9nat"'e (irrespondsra 
conserved region of unknown iolog«:al significance which is kx:ated in the C-terminal section of all these proteins 
Consensus pattem: [GS]-x-S-M-x-[PS]-IAT]-[LFI (S is an active site residue] 

Consensus pattem: K-R-(LIVI^STA](2)-G-x-[PG]-G-[DE]-x-ILIVI^-x-[LIVMFY] [K is an active site residue] 
Consensus pattem: [LIVMFYW](2)-x(2)-G-D-[NH]-x(3)-ISND]-x(2)-[SG] 

\l\T^71^f^i^^Z'''^m ? l^'T^'T 1^^^74-478(1992).[ 2] Sung M.. Dalbey R.E. J. Biol. Chem. 267: 
997-2S93ff5^^^ 

1 997-2004(1 993).[ 5] van DijIJ.M.. de Jong A.. Vehmaanpera J.. Venema G.. Bron S. EMBO J. 11:2819-2828(1992) 
f 6] Rawlings N.D., Barrett A.J. Meth. Enzymol. 244:19-61(1994).[E1J -^o^ouas^}. 
[1509] 628. (sodcu) Copper/Zinc superoxfcJe dismutase signatures 
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Copper/Zinc superoxide dismutase (SODC) (1) is one of the three forms of an enzyme that catalyzes the dismutation 
of superoxide radicals. SODC binds one atom each of zinc and copper. Various forms of SODC are known- acytoplasmk: 
form in eukaryotes, an additional chloroplast form in plants, an extracellular form in some eukaryotes. and a periplasmic 
form in prokaryotes. The metal binding sites are consefved in all the known SODC sequences [2] Two signature 
patterns have been derived for this family of enzymes: the first one contains two histidlne residues that bind the copper 
atom; the second one iskxated in the C-temiinal section of SODC and contains a cysteine which is involved in a 
disulfide bond. Consensus pattern: [GA]-(IMFAT]-H-|LIVF]-H-x(2)-{GP]-(SDG]-x-[STAGDE] [The two H's are copper 
ligands] '^'^ 

Consensus pattern: G-IGNl-(SGA]-G-x-R-x-[SGA]-C-x(2)-[IV] [C is involved in a disulfide bond] 

[ 1] Bannister J.V.. Bannister W.H.. Rotilio G. CRC Crit. Rev. Blochem. 22:111-154(1987) [21 Smith M W Doolittle R 

F. J. Mol. Evol. 34:175-184(1992). 

[1510] 629. (sodfe) Manganese and iron superoxkJe dismutases signature 

Manganese superoxide dismutase (SODM) [1] is one of the three forms of an enzyme that catalyzes the dismutation 
of superoxide radicals. The four ligands of the manganese atom are conserved In all the known SODM sequences 
These metal ligands are also conserved in the related iron fomi ofsuperoxide dismutases [2.3]. A short consen/ed 
region which includes two of the four ligands: an aspartate and a histidine has been selected as a signature 
Consensus pattern: D-x-W-E-H-[STA]-[FY](2) [D and H are manganese/iron ligands] 

[ 1] Bannister J. v.. Bannister W.H., Rotilra G. CRC Crit. Rev Blochem. 22:111-154(1987).[ 2] Parker M W Blake C C 
R FEBS Lett. 229:377-382(1 988).[ 3] Smith M.W., Doolittle R.R J. Mol. Evol. 34:175-184(1992) 
[1511] 630. Spectrin repeat 

[1 51 2] Spectrin repeats are found in several proteins involved in cytoskeletal structure. These include spectrin alpha- 
actinin and dystrophin.The sequence repeat used in this family is taken from the structural repeat in reference [2] The 
spectrin repeat fomns a three helix bundle. The second helix is interrupted by proline in some sequences 
Number of members: 898 

[IJActin-binding proteins. 1: Spectrin super family Hartwig JH; Protein Profile 1995;2:732-732 [2] Crystal struc- 
ture of the repetitive segments of spectrin. Yan Y. Winograd E. Viel A. Cronin T Harrison SC. Branton D" Science 1 993- 
262:2027-2030. 

[1513] 631 . (subtilase) Streptomyces subtilisin-type inhibitors signature 

Bacteria of the Streptomyces family produce a family of proteinase inhibitors[1] characterized by their strong activity 
toward subtihsin. They arecollectively known as SSI's: Streptomyces Subtilisin Inhibitors. Some SSI'salso inhibit trypsin 
or chymotrypsin. In their mature secreted form, SSI's areproteins of about 110 residues with two conserved disulfide 
bonds. + + + + nil 

xxxxxxxxxxxxxxCxxxxxxxCxxxxxxxxxCx#xxxxxxxxxxxxCxxxxxx"**"™C': consen/ed cysteine involved in a di- 
sulfide bond.'#': active site residue.'*': position of the pattern. 

Consensus pattern: C-x-P-x(2.3)-G-x-H-P-x(4)-A-C-[ATD]-x-L [The two C's are involved in a disulfide bond] 
[ 1]Taguchi S., Kojima S., Terabe M., MiuraK.-l., Momose H. Eur. J. Blochem. 220:911-918(1994). 
[1514] 632. Sugar transport proteins signatures 

In mammalian cells the uptake of glucose is mediated by a family of closely related transport proteins which are called 
the glucose transporters [1.2.3]. At least seven of these transporters are currently known to exist (in Human they are 
encoded by the GLUT1 to GLUT7 genes).These integral membrane proteins are predicted to comprise twelve mem- 
brane spanning domains. The glucose transporters show sequence similarities [4.5] with a number of other sugar or 
metabolite transport proteins listed below (references are only provided for recently determined sequences) - Es- 
cherichia coll arabinose-proton symport (araE). - Escherichia coli galactose-proton symport (galP). - Escherichia coli 
and Klebsiella pneumoniae citrate-proton symport (also known as citrate utilization determinant) (gene cit) - Es- 
cherichia coll alpha-ketoglutarate permease (gene kgtP). - Escherichia coli proline/betaine transporter (gene proP) [6] 
- Escherichia coli xylose-proton symport (xylE). - Zymomonas mobllis glucose facilitated diffusion protein (gene gif) - 
Yeast high and low affinity glucose transport proteins (genes SNF3. HXTI to HXT14). - Yeast galactose transporter 
(gene GAL2). - Yeast maltose pemieases (genes MAL3T and MAL6T). - Yeast myo-inositol transporters (genes ITRl 
and ITR2). - Yeast carboxylic acid transporter protein homolog JEN1. - Yeast inorganic phosphate transporter (gene 
PH084). - Kluyveromyces lactis lactose permease (gene LAC12). - Neurospora crassa quinate transporter (gene Qa- 
y), and Emericella nidulans quinate permease (gene qutD). - Chlorella hexose carrier (gene HUPl) - Arabidopsis 
thaliana glucose transporter (gene STP1). - Spinach sucrose transporter. - Leishmania donovani transporters D1 and 
D2. - Leishmania ennettil probable transport protein (LTP). - Yeast hypothetical proteins YBR241c YCR98c and 
YFL040W. - Caenorhabditis elegans hypothetical protein ZK637.1. - Escherichia coli hypothetical proteins yabE ydjE 
and yhjE. - Haemophilus influenzae hypothetical proteins HI0281 and HI0418. - Bacillus subtilis hypothetical proteins 
yxbC and yxdF It has been suggested [4] that these transport proteins have evolved from theduplication of an ancestral 
protein with six transmembrane regions, this hypothesis is based on the conservation of two G-R-[KR] motifs The first 
one IS kxated between the second and third transmembrane domains and the second one between transmembrane 
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G-R-IKR] motif, but because this motif is too short to be specific to this family of proteins, a pattern from a larger regton 
centered onthesecond copy Of this motif wasderived. The second p 

^d^'JfJl'se^erts^' '"'^ transmembrane segment and in the short loop region between the fourth 

Consensus pattern: [LIVIVISTAG]-[LIVMFSAG]-x(2)-[LIVMSA]-[DE]-x-(LIVMFYWA]-G- R-[RKl-x(4 6)-rGSTAl 
Consensus pattern: [LIVI\4FI-x-G-[LIVMFA]-x(2)-G-x(8HLIFY]-x(2HEQ]-x(6)-[RK] ' '^''^ * 
!iQqnuTR^,J!!-^r.""J^^l^'^^^^ 60:757-794(1991)12] Gould G.W.. Bell G.I. Trends Biochem. Sci. 15:18-23 
(1990).[ 3] Baldwin SA Bioch.m. Biophys. Acta 11 54:1 7-49(1 993).[ 4] Maiden M.C.J., Davis E.O.. Baldwin S.A.. Moore 
m n;i'h^r?r? K = 325:641-643(1987).(5] Henderson P.J.R Curr. Opin. Struct. Biol. 1:590-601(1991). 
11-276(1993) Marangon. A.G.. Milner J.L. Steer B.A.. van Nues R.W.. Wood J.M. J. Mol. Biol. 229: 

[ISIS] 633. Synaptobrevin signature 

Synaptobrevin [1] is an intrinsic membrane protein of small synaptic vesicles whose function is not yet known but 
which IS highly consen/ed in mammals, electric ray (where rts is known as VAMP-1 ), Drosophila and yeast [21 In yeast 
there are two closely related fomis of synaptobrevin (genes SNC1 andSNC2) while in mammals there is at list 4 

90 to 11 0 residues followed by a transmembrane region, and then by a short (from 2 to 22 residues) C-temiinal intra- 
vesicular domain. As a signature pattern for synaptobrevin. a highly conserved stretch of residues located in the central 
part of the sequence was selected. 

Consensus pattern^ N-[LIVM]-[DENS]-[KL]-V-x-[DEQ]-R-x(2)-IKR]-[LIVM]-[STDE]- x-tLIVM]-x-[DE 

1] Suedhof TC, Baumert M., Perin M.S.. Jahn R. Neuron 2: 1475-1 481 (1989).[ 2) Gerst J E Rodqers L Riaas M 
WiglerM.Proc. Natl. Acad. Sci. U.S.A. 89:4338-4342(1992). ersi j.t., Moagers L. Riggs M., 

[1516] 634. TBC domain. Identification of a TBC donriain in GYP6_YEAST and GYP7 YEAST which are GTPase 
Sre^'^rb^of^^^^ '"^'^ '''' ""^^ "^^'"^ -^-^ P-'- °' "^b-"^e -II 

[1] Medline: 96032578. Molecular cloning of a cDNA with a novel domain present In the tre-2 oncogene and the 
yeast cell cycle regulators BUB2 and cdc16. Richardson PM. Zon LI; Oncogene 1995 H H39-1148 
PlMedhne: 97398935. A shared domain between a spindle assembly checkpoint protein and Ypt/Rab-specific 
GTPase-activators. Neuwald AF; Trends Bkx^em Sci 1997;22:243-244. 

[1517] 635. Transcription factor TFIID repeat signature (TBP) 

Transcription factor TFIID(orTATA-binding protein, TBP)[1.2]isageneralfactor that playsamajor role in theact^^^^^^ 
of eukaryotic genes transcribed by RNA po^merase II. TFIID binds specifically to the TATA box promoter elern^" 

ftrrmln^i H f T^'^o.^' ^here is a remarkable degree of sequence conservation of a 

f '^^'"^^ °' '^'^ ^'"'^i" 's tf'e Presence of two consented 

n ^'Tf intramolecular symmetry generates a saddle-shaped structure that sits astride 

t!tT^ ^ " ^^^-'^^'^ '«^«°^) W is a sequence-specific transcriptkni factor that also binds to the 

Inlt^l rf^n ^^"y "^Tl '° """"^ «'s° possess a TBP homolog [5]. A signature pattern that 

spans the last 50 residues of the repeated region has been derived- 

Consensus pattern: Y-x-P-x(2)-(IF]-x(2)-[LIVM](2)-x-[KRH]-x(3)-P-[RKQl-x(3)- L-fLIVMl-F-x-rSTNl-G-fKRl-fLlVMi y 
(3)-G.[TAGLHKR]-x(7)- [AGC].x(7)-[LIVM [ 1] Hoffmann I Shin E.. Yarlainoto T. J R^y ^, SkK 

Roeder R.G. Nature 346:387-390(1 990)1 2] Gash A.. Hoffmann A.. Horikoshi M., Roeder R G ^hua N +^ Natur^ 
^^390-394(1990).[ 3] Nikolov D.B.. Hu S.-H., Lin J,. Gasch A.. Hoffmann A.. Horikoshi M.. Chua N H Roede^R 

% r .1 r TrTTT}:\ ' ■ Y-^- "-Y- Tiian R Nature il! 

099^ ^ ■ " ^'"^ ^'^'^ Sci. U.S.A 91:4180-4184 

[1518] 636. Translationally controlled tumor protein signatures (TCTP) 

Mammalian translationally controlled tumor protein (TCTP) (or P23) is a protein which has been found to be preferen- 
noSlT^rT T V""^ 9^°** P^^ase of some types of tumor (1 ,2]. but which is ateo expressed in 

nomial cells. The physiological function of TCTP is still not known. It is a hydrophilic protein of 18 to 20 Kd Close 
?/^°fSucr hT '""""^ Caenorhabditis elegans (F52H2.11). Hydra, budding yeast 
pIt5em7?or TCTP ^^''^ ' ^ "^"^ °' ''''' """" «*9"«<"^« 

Consensus pattern: [IFA)-[GA]-[GAS]-N-[PAK]-S-[GA]-E-[GDE]-IPAGE]-[DEQGA1 
Consensus pattem: (FLVH]-[FY]-[IVCT]-G-E-x-[MA]-x(2,5)-(DEN]-{GAST]-x-[LVh[AV]-x(3)-fFYW] 
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[ 1] Boehm H.. Beendorf R. Gaestel M., Gross B.. Nuernberg P., Kraft R.. Otto A., Bielka H. Biochem. Int. 19:277-286 
(1989).( 2] Makrides S.. Chitpatima S.T.. Bandyopadhyay R., Brawerman G. Nucleic Acids Res 16-2350-2350(1 988) 
[3] Pay A., Heberle-Bors E.. Hirt H. Plant Mol. Biol. 19:501 -503(1 992).[ 4] Stuerzenbaum S.R.. Kille P, Morgan A j' 
Biochim. Biophys. Acta 1 398:294-304(1 998).( 5] Rasmussen S.W. Yeast 10:S63-S68(1994). 
[1519] 637. TFIIS zinc ribbon domain signature 

Transcription factor S-ll (TFIIS) (1 ] is a aukaryotic protein necessary for efficient RNA polymerase II transcription elon- 
gation, past template-encoded pause sites. TFIIS shows DNA-binding activity only in the presence of RNA polymerase 
II. It IS a protein of about 300 amino acids whose sequence is highly conserved in mammals. Drosophila. yeast (where 
It was first known as PPR2, a transcriptranal regulator of URA4, and then as DST1 . the DNA strand transfer protein 
alpha [2]) and in the archaebacteria Sulfolobus acidocaldarius [3].This family also includes the eukaryotic and arche- 
bacterial RNA polymerase subunits of the 1 5 Kd / IVI family (see <PDOC00790 >) as well as the following viral proteins- 
- Vaccinia virus RNA polymerase 30 Kd subunit (ipoSO) [4]. - African swine fever virus protein I243L [5] The best 
consented region of all these proteins contains four cysteines that bind a zinc ion and fold in a conformation termed a 
'zinc nbbon' [6]. Besides these cysteines, there are a number of other conserved residues which can be used to help 
define a specific pattern for this type of domain. 

Consensus pattern: C-x(2)-C-x(9)-(LIVIVIQSAR]-[QH]-(STQL]-[RA]-(SACR]-x-[DE]-[DET]-[PGSEAl-x(6)-C-x(2 5)-C-x 
(3)-[FW] [The four C's are zinc ligands] 

[1] Hirashima S., Hirai H., Nakanishi Y, Natori S. J. Biol. Chem. 263:3858-3863(1 988).[ 263:3858-3863(1 988) [ 2] Ki- 
pling D., Kearsey S.E. Nature 353:509-509(1 991 ).[ 3] Langer D., Ziliig W. Nucleic Acids Res. 21 :2251 -2251 (1993) [ 4] 
Ahn B.-Y. Gershon PD., Jones E.V.. Moss B. Mol. Cell. Biol. 10:5433-5441 (1990).[ 5) Rodriguez J M Salas M L 
Vinuela E. Virology 186:40-52(1 992).[ 6] Qian X.. Jeon C, Yoon H., Aganwal K.. Weiss M. A Nature 365:277-279(1 993)' 
[1520] 638. Tetrahydrofolate dehydrogenase/cyclohydrolase signatures (THF DHG CYH) 

Enzymes that participate in the transfer of one-carbon units are involved in various biosynthetic pathways In many of 
these processes the transfers of one-carbon units are mediated by the coenzyme tetrahydrofolate (THF) V&rious 
reactions generate one-carbon derivatives of THF which can be interconverted between different oxidatton states by 
formyltetrahydrofolate synthetase(EC 6.34.3) . methylenetetrahydrofolate dehydrogenase (EC 1 5 1 5 or EC 1 5.1.15) 
and methenyltetrahydrofolate cyclohydrolase (EC 3.5.4.9 ).The dehydrogenase and cyctohydrolase activities are ex- 
pressed by a variety of multifunctional enzymes: - Eukaryotic C-1 -tetrahydrofolate synthase (CI -THF synthase) which 
catalyzes all three reactbns described above. Two forms of CI -THF synthases are known [1], one is located in the 
mitochondrial matnx, while the second one is cytoplasmic. In both forms the dehydrogenase/cyclohydrolase domain 
IS located in the N-terminal section of the 900 amino acids protein and consists of about 300 amino acid residues The 
CI -THF synthases are NADP- dependent. - Eukaryotic mitochondrial brfunctional dehydrogenase/cyclohydrolase (21 
This IS an homodimeric NAD-dependent enzyme of about 300 amino acid residues. - Bacterial folD [3] FolD is an 
homodimeric bifunctional NADP-dependent enzyme of about 290 amino acid residues. The sequence of the dehydro- 
genase/cyclohydrolase domain is highly conserved in all forms of the enzyme. Two consented regions have been 
selected as signature patterns. The first one is kx:ated in the N-terminal part of these enzymes and contains three 
acidic residues. The second pattern is a highly conserved sequence of 9 amino acids which is located in the C^erminal 
section. 

Consensus pattern: [EQ]-x-[EQK]-[LIVM)(2)-x(2)-[LIVM]-x(2)-[LIVMY]-N-x-[DN]-x(5)-[LIVMF](3)-Q-L-P-fLVI 
Consensus pattern: P-G-G-V-G-P-[MF]-T-[IV] 

[ 1] Shannon K.W., Rabinowitz J.C. J. Biol. Chem. 263:771 7-7725(1 988).[ 2] Belanger C. Mackenzie RE J Biol 
Chem. 264:4837-4843(1 989).[ 3] d'Ari L, Rabinowitz J.C. J. Biol. Chem. 266:23953-23958(1991). 
[1521] 639. Triosephosphate isomerase active site (TIM) 

Triosephosphate isomerase (EC5.3.1.1) (TIM) [1 ] is the glycolytic enzyme that catalyzes the reversible interconversion 
of glyceraldehyde 3-phosphate and dihydroxyacetone phosphate. TIM plays an important role in several metabolic 
pathways and is essential for efficient energy production. It is a dimer of identical subunits, each of which is made up 
of about 250 amino-acid residues. A glutamic acid residue is involved in the catalytic mechanism [2]. The sequence 
around the active site residue is perfectly conserved in all known TIM's and can be used as a signature pattern for this 
type of enzyme. 

Consensus pattern: [AV]-Y-E-P-[LIVM]-W-[SA]-I-G-T-(GK] (E is the active site reskJue] 

( 1] Lolis E., Alber T. Davenport R.C.. Rose D., Hartman F.C., Petsko G.A. Biochemistry 29:6609-6618(1990) 121 

KnowlesJ.R. Nature 350:121-124(1991). 

[1522] 640. Thymidine kinase cellular-type signature (TK) 

Thymkline kinase (TK) (EC 2.7.1.21) is an ubiquitous enzyme that catalyzes the ATP-dependent phosphorylation of 
thymidine. A comparison of TK sequences has shown [1.2,3J that there are two different families of TK One family 
groups together TK from herpes vinjses as well as cellular thymidylata kinases, while the secwid family currently 
consists of TK from the following sources: - Vertebrates. - Bacterial. - Bacteriophage T4. - Pox viruses - African swine 
fever virus (ASF). - Fish lymphocystis disease virus (FLDV).A conserved region which is located In the C-terminal 
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section of these enzymes has been selected as a signature pattern for this family of TKA 

Consensus pattern: {GA]-x(1.2).[DE]-x-Y-x-lSTAP]-x-C-[NKR]-x-[CH]-[UVMFYWH) [ 1] Boyle D B Coupar B E H 
Gibbs A.J.. Seigman LJ.. Both G.W. Virology 1 56:355-365(1 987).( 2] Blasco R.. Lopez-ain C. Munoz M.. Bockamp 

^ 'o«""°""'^^*®° ^""^'^ ^ 178:301 -304(1 990).( 3] Robertson G.R.. Whalley J.M. Nucleic Acids Res 

16:11303-11317(1988). 

[1523] 641 . Thymidine kinase from herpesvirus (TK herpes) 
[1] 

Medline: 96003730 

Crystal structures of the thymidine kinase from herpes simplex virus type-1 in complex wHh deoxythymidine and qan- 
ciclovir. " 

Brown DG, Visse R, Sandhu G, Davies A, Rizkallah PJ, Melitz 
C. Summers WC, Sanderson MR; 
Nat Struct Biol 1995;2:876-881. 
Number of members: 65 

[1524] 642. Nuclear transition protein 2 signatures (TP2) 

In mammals, the second stage of spermatogenesis is characterized by the conversion of nucleosomal chromatin to 
the compact, non-nucleosomal and transcriptionally inactive form found in the sperm nucleus. This condensation Is 
associated with a double-protein transition. The first transition corresponds to the replacement of histones by several 
spemiatid-specific proteins, also called transition proteins, which are themselves replaced by protamines during the 
second transition. Nuclear transition protein 2 (TP2) is one of those spermatid-specific proteins TP2 Is a basic zinc- 
binding protein [1] of 116 to 137 amino-acid residues. Structurally, TP2 consists of three distinct parts- a conserved 
senne-rich N-terminal domain of about 25 residues, a variable central domain of 20 to 50 residues which contains 
cysteine residues, and a consented C-terminal domain of about 70 residues rich in lysines and arglnines Two signature 
pattems for TP2 have been developed: one located in the N-termlnal domain, the other in the C-terminal 
Consensus pattern: H-x(3)-H-S-(NS]-S-x-P-Q-S 
Consensus pattern: K-x-R-K-x(2)-E-G-K-x(2)-K-[KR]-K 

[1] Baskaran R.. RaoM.R S. Biochem. Biophys. Res. Commun. 179:1491-1499(1991). 
[1525] 643 Thiamine pyrophosphate enzymes signature (TTP enzymes) 

A number of enzymes require thiamine pyrophosphate (TPP) (vitamin B1) as a cofactor It has been shown [1] that 
some of these enzymes are structurally related. These related TPP enzymes are: - Pyruvate oxidase (POX) (EC 1 23 3) 
Reaction catalyzed: pyruvate + orthophosphate + 0(2) + H(2p = acetyl phosphate + CO(2) + H(2)0(2) - FW^Te 
decarboxylase (PDC) (EC 4JJJ) Reaction catalyzed: pymvate = acetaldehyde + CO(2). - Indolepyruvate decarbox- 

o^x^frl^"^^^ Reaction catalyzed: indole-3-pyruvate = indole-3-acetaldehyde + CO(2). - Acetolactate synthase 
(ALS) (EC 4.1.3.18) Reaction catalyzed: 2 pyruvate = acetolactate + CO(2). - Benzoylformate decarboxylase (BFD) 
(ECiJMj) [3] Reaction catalyzed: benzoylformate = benzaldehyde + CO(2). A conserved region which is located in 
their C-terminal section has been selected as a signature pattern for these enzymes. 
Consensus pattern; [LIVMF]-[GSA]-x(5)-P-x(4)-[LIVMFYW]-x-[LIVMF]-x-G-D-[GSA]-[GSAC] 
[ 1] Green J.B.A. FEBS Lett. 246:1-5(19B9).[ 2] Koga J.. Adachi T, Hidaka H. Mol. Gen. Genet 226-10-16(1991) [ 3] 
Tsou A.Y., Ransom S.C.. Gerit J.A.. Buechter D.D.. Babbitt PC. Kenyon G.L. Biochemistry 29:9856-9862(1990) 
[1526] 644. TPR Domain 

[1J 

Medline: 95397415 

Tetratrico peptide repeat interactions: to TPR or not to TPR? 
Lamb JR. Tugendreich S, Hieter P; 
Trends Biochem Sci 1995;20:257-259, 
[2]Medline: 98151343 

The structure of the tetratricopeptide repeats of protein phosphatase 5: implications for TPR-mediated protein-protein 
interactions. ^ 
Das AK, Cohen PW, Barford D; 

EMBO J 1998; 17: 11 92-1 199. 
Number of members: 621 

[1527] 645. Uroporphyrin-lll C-methyltransferase signatures (TP methylase) 

Uroporphyrin-lll C-methyltnansferase (EC 2.1.1.107) (SUMT) [1 ,2] catalyzes the transfer of two methyl groups from S- 
adenosyl-L-methionine to the C-2 and C-7atoms of uroporphyrinogen III to yield precorrin-2 via the intermediate for- 
mation of precorrin-1 . 

SUMT is the first enzyme specific to the cobalamin pathway and precorrin-2 is a common intermediate in the biosyn- 
thesis of corrinoids such as vitamin B12. siroheme and coenzyme F430.The sequences of SUMT from a variety of 
eubactenal and archaebacterial species are currently available. In species such as Bacillus megaterium (gene cobA) 
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Pseudomonas denrtrificans (cobA) or Methanobacterium ivanovii (gene corA) SUMT is a protein of about 25 to ^3 Kd 
In Eschenchia coli and related bacteria, the cysG protein, which is involved in the biosynthesis of sir<^eme is a mul- 
tff uncttonal protein composed of a N-terminal domain, probably involved in transforming precorrin-2 into siroheme and 
a C-temiinal domain which has SUMT activity. The sequence of SUMT is related to that of a number of P denitrifii:ans 
and Salmonella typhimurium enzymes involved in the biosynthesis of cobalamin which also seem to be SAM<Jependent 
methyltransferases [3,4]. The similarity Is especially strong with two of these enzymes: cobl/cbiL which encodes S- 
adenosyl-L-methionine-precorrin.2 methyltransferase and cobM/cbiF whose exact function is not known Two signa- 
ture patterns have been developed for these enzymes. The first corresponds to a well conserved regbn in the N- 
terminal extremity (called region 1 In [1.3]) and the second to a less conserved region located in the central part of 
these proteins (this pattern spans what are called regions 2 and 3 in [1 .3]). 
Consensus pattern: [LIVM].[GS]-[STAL].G.P-G-x(3)-[LIVMFY]-[LIVM]-T-[LIVM]-[KRHQG]-[AG] 
Consensus pattern: V.x(2)4LI]-x(2).G-D-x(3)-[FYW]-[GS]-x(8HLIVF]-x(5,6).[LIVMFmPAC].x-[LI^ 
[ 1] Blanche R. Robin C. Couder M.. Faucher D., Cauchois L. Cameron B.. Crouzet J. J. Bacteriol. 173:4637-4645 
(1991).[ 2] Robin C. Blanche R, Cauchois L„ Cameron B.. Couder M„ Crouzet J. J. Bacteriol. 173-4893-4896(1991) 
[ 3] Crouzet J.. Cameron B.. Cauchois L. Rigault S., Rouyez M.<J,. Blanche R. Thibaut D.. Debussche L J Bacteriol 
1 72:5980-5990(1 990).[ 4] Roth J.R., Lawrence J.G.. Rubenfield M.. Kieffer-Higgins S.. Church G M J Bacteriol 175* 
3303-331 6(1 993). [ 5] Mattheakis L.C., Shen W.H.. Collier R.J. Mol. Cell. Biol 12 4026-40370992^ 
[1528] 646. Tudor domain ^ ^' 

Domain of unknown function present in several RNA-binding proteins, copies in the Drosophila Tudor protein Slight 
ambiguities in the alignment.Number of members: 18 

[IjMedline: 97200561 Tudor domains in proteins that interact with RNA. Ponting CP; Trends Biochem Sci 1997-22- 
51-52. [2]Medline: 97157029 The human EBNA-2 coactivator plOO: multidomain organization and relationship to the 
staphylococcal nuclease fold and to the tudor protein involved in Drosophila melanogaster development Callebaut I 
Momon JP; Biochem J 1 997;321 : 1 25-1 32. ^- - , 

[1529] 647. Terpene synthase family 

It has been suggested that this gene family be designated tps (for terpene synthase) [1]. It has been split into six 
subgroups on the basis of phylogeny called tpsa-tpsf. tpsa includes vetispiridiene synthase Swiss Q39979 S-epi- 
aristolochene synthase, Swiss:Q40577 and (+)-delta-cadlnene synthase Swiss:P93665. 
tpsb includes (-)-limonene synthase, Swiss:Q40322. 
tpsc includes kaurene synthase A. Swiss:004408. 
tpsd includes taxadiene synthase, Swiss:Q41594, pinene synthase, 
Swlss:024475 and myrcene synthase, Swiss;024474. 
tpse includes kaurene synthase B. 
tpsf includes linalool synthase. 
Number of members: 51 
[1] 

Medline: 97413772 

Monoterpene synthases from grand fir (Abies grandis). cDNA isolation, characterization, and functional expression of 
myrcene synthase, (-)-(4S)-limonene synthase, and (-)-(1S.5S)-pinene synthase. 
Bohlmann J, Steele CL, Croteau R; 
J Biol Chem 1997;272:21784-21792. 
[1530] 648. ThiF family 

This family contains a repeated domain in ubiquitin activating enzyme El and members of the bacterial 
ThiF/MoeB/HesA family Number of members: 87 
[1531] 649. Thioester dehydrase 

Members of this family are involved in fatty acid biosynthesis. 
Number of members: 1 9 
(1] 

Medline: 96398612 

Structure of a dehydratase-isomerase from the bacterial pathway for biosynthesis of unsaturated fatty acids- two cat- 
alytic activities in one active site. 
Leesc^g M. Henderson BS, Gillig JR, Schwab JM. Smith JL; 
Structure 1996;4:253-264. 
Database Reference: SCOP; Imka; fa; [SCOP-USA] [CATH-PDBSUM] 
Database reference: PFAMB; PB058036; 
[1532] 650. Tub family signatures 

The mouse tubby mutation is the cause of maturityonset obesity, insulin resistance and sensory deficits This mutation 
maps to a gene, tub (1.2],which codes for a protein that belongs to a family which currently consists of the following 
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members: - Mammalian tub, an hydrophilic protein of about 500 residues, which could be involved in the hypothalamic 
regulation of body weight. - Human protein TULP1 [3] which may be involved in retinis pigmentosa 14, a retinal de- 
generatbn disease. - Mouse protein p4^ whose function is not known. - Caenorhabditis elegans hypothetical protein 
F10B5.4. - Several fragmentary sequences from plants, Drosophila and human ESTs. While the N-terminal part of 
these protein is not conserved in length nor in the sequence, the C-temiinal 250 residues are highly conserved. There- 
fore, two regions were selected in the C-temriinal part as signature patterns. The secondr egion is located at the C- 
termlnal extremity and cx)ntains a penultimate cysteine residue that could be critical to the normal functioning of these 
proteins. 

Consensus paUern: F-IKHQ]-G-R-V-[ST]-x-A-S-V-K-N-FO 

Consensus pattern: A-F-[AG]-l-[SAC]-[UVM]-[ST]-S-F-x-[GST]-K-x-A-C-E 

[ 1] Kleyn P.W., Fan W.. Kovats S.G,. Lee J.L., Pulido J.C., Wu Y. Berkemeier LR.. MIsumi D.J.. Holmgren L. Charlat 
O.. Woolf E.A., Tayber O., Brody T. Shu R. Hawkins F.. Kennedy B.. Baldini L., Ebeling C, Alperin G.D.. Deeds J.. 
Lakey N.D., Culpepper J., Chen H., Gluecksmann-Kuls M.A.. Carlson G.A., Duyk G.M., Moore K.J. Cell 85:281 -290 
(1996).[ 2] Noben-Trauth K., Naggert J.K., North M.A.. Nishina P.M. Nature 380:534-538(1 996 ).[ 3] North M.A., Naggert 
J.K., Yan Y. Noben-Trauth K., Nishina RM. Proc. Natl. Acad. Sci. U.S.A. 94:3128-3133(1997). 
[1533] 651 . Eukaryotic DNA topoisomerase I active site 

DNA topoisomerase I (EC 5.99.1.2) [1.2,3,4.E1J is one of the two types of enzyme that catalyze the interconversion 
of topological DNA isomers. Type Itopoisomerases act by catalyzing the transient breakage of DNA, one strand at a 
time, and the subsequent rejoining of the strands. When a eukaryotb type Itopoisomerase breaks a DNA backbone 
bond, it simultaneously fomns a protein-DNA link where the hydroxyl group of a tyrosine residue Is joined to a 3'- 
phosphate on DNA. at one end of the enzyme-severed DNA strand. In eukaryotes and pox virus topoisomerases I. 
there are a number of conserved residues in the region around the active site tyrosine. 
Consensus pattem: [DEN]-x(6)-[GS]-[IT]-S-K-x(2)-Y-[LIVM]-x(3)-[LIVM] [Y is the active site tyrosine] 
[ 1]Stemglanz R. Curr. Opin. Cell Biol. 1:533-535(1 990). [2] Sharma A., Mondragon A Curr. Opin. Struct. Biol. 5:39-47 
(1995).[3] Lynn R.M.. Bjornsti M.-A., Caron PR., Wang J.C. Proc. Natl. Acad, Sci. U.S.A. 86:3559-3563(1989) [4] 
Roca J. Trends Biochem. Sci. 20: 156-1 60(1 995).[E1] 
[1534] 652. Transaldolase signatures 

Transaldolase (EC 2.2.1.2) catalyzes the reversible transfer of a three-carbon ketol unit from sedoheptulose 7-phos- 
phate to glyceraldehyde 3-phosphate to fomi erythrose 4-phosphate and fructose 6-phosphate. This enzyme, together 
with transketolase, provides a link between the glycolytic and pentose-phosphate pathways. Transaldolase is an en- 
zyme of about 34 Kd whose sequence has been well conserved throughout evolution. A lysine has been implicated 
[1]in the catalytic mechanism of the enzyme; it acts as a nucleophilic group that attacks the carbonyl group of fructose- 
6-phosphate.Transaldolase is evolutionary related [2] to a bacterial protein of about 20Kd (known as talC in Escherichia 
coli). whose exact function Is not yet known. Two signature patterns have been developed for these proteins. The first, 
located in the N-terminal section, contains a perfectly conserved pentapeptide; these cond. includes the active site 
lysine. 

Consensus pattern: [DG]-[1VSA]-T-[ST]-N-P-[STA]-[L1VMF](2) 

Consensus pattem: [LIVM]-x-[LIVM]-K-[LIVM]-[PAS]-x-[ST]-x-[DENQPAS]-G-[LIVM]-x-[AGV]-x-[QEKRSTl-x-[LIVMl 
[K is the active site residue] 

[ 1] Miosga T, Schaaff-Gerstenschlaeger I., Franken E., Zimmemnann F.K. Yeast 9: 1241-1 249(1 993).[ 2] Relzer J.. 

Reizer A,, Saier M.H. Jr. Microbiology 141:961-971(1995). 

[1535] 653. (Transpeptidase) Penicillin binding protein transpeptidase domain 

[1 536] The active site serine (residue 337 in Swiss:P14677 1 is consen/ed in all members of this family. 

[1537] [1] Pares S, Mouz N, Petillot Y Hakenbeck R, Dideberg O Nat Struct Biol 1996; 3 284-289. 

[1538] 654. Trehalase signatures 

Trehalase (EC 3.2.1.28) is the enzyme responsible for the degradation of the disaccharide alpha, alpha-trehalose 
yielding two glucose subunits [1]. It is an enzyme found in a wide variety of organisms and whose sequence has been 
highly conserved throughout evolution. Two of the most highly conserved regions have been selected as signature 
patterns. The first pattern is located in the central section, the second one is in the C-terminal regbn Consensus 
pattem: P-G-G-R-F-x-E-x-Y-x-W-D-x-Y 
Consensus pattem: Q-W-D-x-P-x-{GA]-W-[PAS]-P 

[ 1] Kopp M.. Mueller H.. Holzer H. J. Biol. Chem. 268:4766-4774(1 993). [ 2] Henrissat B., Bairoch A. Biochem J 293* 
781-788(1993).[E1] 

[1539] 655. Trehalose-6-phosphate synthase domain 

[1540] OtsA (Trehalose-6-phosphate synthase) is homologous to regions in the subunits of yeast trehalose-6-phos- 
phate synthase/phosphate complex, [1]. 

[1541] [1] Kaasen I. McDougall J. Strom AR; Gene 1994;145:9-15. 
[1542] 656. Tropomyosins signature 
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Tropomyosins [1.2] are family of closely related proteins present in muscle and non-muscle cells. In striated muscle 
tropomyosin mediate the interactions between the troponin complex and actin so as to regulate muscle contraction" 
The role of tropomyosin in smooth muscle and non-muscle tissues is not clear. Tropomyosin is an alpha-helical protein 
that forms a coiled-coil dimer. Muscle isofomis of tropomyosin are characterized by having 284 amino acid residues 
and a highly conserved N-temiinal region, whereas non-muscle fomis are generally smaller and are heterogeneous 
in their N-terminal region. The signature pattem for tropomyosins is based on a very conserved region In the C-terminal 
section of tropomyosins and which is present in both muscle and nwi-muscle forms 
Consensus pattern: L-K-E-A-E-x-R-A-E 

[ 1] Smilie LB. Trends Biochem. Scl. 4; 151-1 55(1 979).[ 2] McLeod A.R. BtoEssays 6:208-212(1986) 
[1543] 657. Troponin 

Troponin (Tn) contains three subunits. Ca2+ binding (TnC), inhibitory (Tnl). and tropomyosin binding (TnT) this Ram 
contains members of the TnT subunit. 

Troponin is a complex of three proteins. Ca2+ binding (TnC). inhibitory (Tnl). and tropomyosin binding (TnT) 
The troponin complex regulates Ca++ induced muscle contraction. 

This family includes troponin T and troponin I. Troponin I binds to actin and troponin T binds to tropomyosin 
Numberof members: 81 [1] ' 
Medline: 87144593 

Structure of co-crystals of tropomyosin and troponin. 
White SP. Cohen C. Phillips GN Jr; 
Nature 1987;325:826-828. [2] 
Medline: 95155315 

A direct regulatory role for troponin T and a dual role for 
troponin C In the Ca2+ regulation of muscle contraction. 
Potter JD. Sheng Z. Pan BS, Zhao J; 

J Biol Chem 1995;270:2557-2562. 

[3]Medline: 95324796 
The troponin complex and regulation of muscle contraction. 
Farah CS. Reinach FC; 

FASEB J 1995;9:755-767. 
[1544] 658. (Tryp mucin) Mucln-like glycoprotein 

[I^^ J^is fa^iily of trypanosomal proteins resemble vertebrate mucins. The protein consists of three regions The 
N and C temiinii are conserved between all members of the family, whereas the central region is not well conserved 
and contains a large number of threonine residues virhich can be glycosylated [1] 

Indirect evidence suggested that these genes might encode the core protein of parasite mucins, glycoproteins that 
were proposed to be involved in the interaction with, and invasion of. mammalian host cells. 

[1] Di Noia JM. Sanchez DO. Frasch AC; J Biol Chem 1995;270:24146-24149. 

[2] Di Noia JM. D'Orso I. Aslund L. Sanchez DO. Frasch AC; J Biol Chem 1 998;273: 1 0843-1 0850. 

[1546] 659. Aminoacyl-transfer RNA synthetases class-l signature (tRNA synt 1 ) 

Aminoacyl-tRNA synthetases (EC 6.1.1.-) [1] are a group of enzymes which activate amino acids and transfer them 
to specific tRNA molecules as the first step in protein biosynthesis. In prokaryotic organisms there are at least twenty 
different types of aminoacyl-tRNA synthetases, one for each dWerentamino acid. In eukaryotes there are generally 
wo aminoacyl-tRNA synthetases for each different amino acid: one cytosolic form and a mitochondrial fom, While all 
these enzymes have a comnrran function, they are wideV diverse interiro of subunit size and of quaternary structure 
A few years ago it was found [2] that several aminoacyl-tRNA synthetases share a region of similarity in their N-terminal 
section in particular the consensus tetrapeptide His-lle-Gly-His ('HIGH) is very well conserved. The 'HIGH' region has 
been shown [3] to be part of the adenylate binding site. The "HIGH' signature has been found in the aminoacyWRNA 
synthetases specific for arginine. cysteine, glutamic acid, glutamine. isoleucine. leucine, methionine, tyrosine, tryp- 
topha/i and valine. These aminoacyl-tRNA synthetases are referred to as class-l synthetases [4,5.61 and seem to 
snare the same tertiarystructure based on a Rossmann fold. Consensus pattem: P-x(0 2)-[GSTANl-rDENQGAPIG y 
(LIVMFP]-(HT]-[LIVMYAC]-G-[HNTG]-[LIVMFYSTAGPC] ' ^ (DENQGAPiq-x- 

[1] Schirrimel R Annu. Rev. Biochem. 56: 125-1 58(1 987).[ 2] Webster T. Tsai H., Kula M., Mackie G.A.. Schimmel P 
Science 226:1315-1317(1984).[ 3] Brick P.. Bhat TN.. Blow D.M. J. Mol. Biol. 208:83-98(1 988).( 4] Delame M Moras 
D. BK)Essays 15:675-687(1 993).( 5] Schimmel P. Trends Biochem. Sci. 16:1-3(1991).(6] Nagel G.M . Doolittle R F 
Proc. Natl. Acad. Sci.U.S.A. 88:8121-8125(1991). »«■ o.m., uooimie n.h. 

[1547] 660. Aminoacyl-transfer RNA synthetases class-l signature (tRNA synt 1 b) 

Aminoacyl-tRNA synthetases (EC 6.1.1.-) [1] are a group of enzymes which activate amino ackJs and transfer them 
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to specific IRNA molecules as the first step in protein biosynthesis. In prokaryotic organisms there are at least twenty 
different types of aminoacyl-tRNA synthetases, one for each different amino acid. In eukaryotes there are generally 
two aminoacyMRNA synthetases for each different amino acid: one cytosolic form and a mitochondrial form While all 
these enzymes have a common function, they are widely diverse in terms of subunit size and of quaternary structure 
A few years ago it was found [2] that several aminoacyl-tRNA synthetases share a region of similarity in their N-termlnal 
section, in partcularthe consensus tetrapeptide His-lle-Gly-His ('HIGH') is very well conserved. The ■HIGH'region has 
been shown [3] to be part of the adenylate binding site. The 'HIGH" signature has been found in the aminoacyl-tRNA 
synthetases specific forarginine. cysteine, glutamic acid, glutamine. isoleucine. leucine, methfonine tyrosine tryp- 
tophan, and valine. These aminoacyl-tRNA synthetases are referred to as class-l synthetases [4.5.6] and seem to 
share the same tertiary structure based on a Rossmann fold. Consensus pattem: P-x(0.2)-fGSTANl-rDENQGAPKl-x- 
[LIVMFPHHT]-(LIVMYAC]-G-(HNTGHLIVMFYSTAGPC inj lutiNui^AKNj x 

( 1] Schimmel P Annu. Rev Biochem. 56: 125-1 58(1 987).[ 2) Webster T. Tsai H.. Kula M.. Mackie G.A.. Schlmmel P 
Science 226:1315-1317(1984).[ 3] Brick P. Bhat TN.. Blow D.M. J. Mol. Biol. 208:83-98(1 988).[ 4] Delarue M Moras 
D. BioEssays 1 5:675-687(1 993).[ 5] Schimmel P Trends Biochem. Sci. 16:1-3(1991).[6] Nagel G M Doolittle R F 
Proc. Natl. Acad. Sci. U.S.A. 88:8121-8125(1991). ' ' 

[1548] 661 . (tRNA-synt 1C) tRNA synthetases class I (E and Q) 
[1549] Other tRNA synthetase sub-families are too dissimilar to be included. 
This family includes only glutamyl and glutaminyl tRNA synthetases. 

In some organisms, a single glutamyl-tRNA synthetase aminoacylates both tRNA(Glu) and tRNA(Gln) 

[1550] [1] Rath VL. Silvian LF. Beijer B, Sproat BS, Steitz TA; Structure 1998;6 439-449 

[1551] 662. (tRNA-synt Id) IRNA synthetases class I (R) 

[1552] Other tRNA synthetase sub-families are too dissimilar to be included. 

This family includes only arginyl tRNA synthetase. 

[1553] 663. Aminoacyl-transfer RNA synthetases class-ll signatures (tRNA synt 2) 

Aminoacyl-tRNA synthetases (EC 6.1.1.-) [1] are a group of enzymes which activate amino acids and transfer them 
to specific tRNA molecules as the first step in protein biosynthesis. In prokaiyotic organisms there are at least twenty 
different types of aminoacyl-tRNA synthetases, one for each different amino acid. In eukaryotes there are generally 
two aminoacyl-tRNA synthetases for each different amino acid: one cytosolic form and a mitochondrial foim While all 
these enzymes have a common function, they are widely diverse interms of subunit size and of quaternary structure 
The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, lysine, phenylalanine, proline serine 
and threonine are referred to as class-ll synthetases [2 to 6] and probably have a common folding pattem in their 
catalytic domain for the binding of ATP and amino acid which is different to the Rossmann fold observed for the class 
I synthetases [7].CIass-ll tRNA synthetases do not share a high degree of similarity, however at least three conserved 
regions are present [2,5,8]. Signature patterns have been derived from two of these regions 
Consensus pattern: [FYH]-R-x-[DE]-x(4,12)-[RH]-x(3)-F-x(3)-[DE 

Consensus pattem: [GSTALVF]-{DENQHRKPHGSTA]-[LIVMF]-[DE]-R-[LIVMF]-x-[LIVMSTAG]-[LIVMFY] 
[ 1] Schimmel P Annu. Rev Bkx:hem. 56:125-158(1987).[2] Delarue M., Moras D. BioEssays 15 675-687(1993) 131 
Schimmel P Trends Biochem. Sci. 16:1-3(1991).[4] Nagel G.M.. Doolittle R.R Proc. Natl. Acad. Sci USA 88- 
8121-8125(1991). [ 5] Cusack S.. Haertlein M., Leberman R. Nucleic Acids Res. 19:3489-3498(1991) [61 Cusack S 
Biochimie 75:1077-1081(1993).[7] Cusack S.. Berthet-Colominas C. Haertlein M., Nassar N., Lebemian R Nature 
347:249-255(1 990).[ 8] Leveque F, Plateau P, Dessen P, Blanquet S. Nucleic Acids Res. 18:305-312(1990) 
[1 554] 664. Aminoacyl-transfer RNA synthetases class-l signature (tRNA synt 1 e) 

AminoacyMRNA synthetases (EC 6.1.1.-) [1] are a group of enzymes which activate amino acids and transfer them 
to specific tRNA molecules as the first step in protein biosynthesis. In prokaryotic organisms there are at least twenty 
different types of aminoacyl-tRNA synthetases, one for each different amino acid. In eukaryotes there are generally 
two aminoacyl-tRNA synthetases for each different amino acid: one cytosolic form and a mitochondrial form While all 
these enzymes have a common function, they are widely diverse in terms of subunit size and of quaternary stmcture 
Afewyearsago it was found [2] that several aminoacyl-tRNA synthetases share a region of similarity in their N-terminal 
section, m particular the consensus tetrapeptide His-lle-Gly-His CHIGH") is very well consented. The 'HIGH' regbn has 
been shown [3] to be part of the adenylate binding site. The 'HIGH" signature has been found in the aminoacyMRNA 
synthetases specific forarginine, cysteine, glutamic ackJ, glutamine, isoleucine, leucine, methranine, tyrosine tryp- 
tophan, and valine. These aminoacyl-tRNA synthetases are referred to as class-l synthetases [4,5,6] and seem to 
share the same tertiary stojcture based on a Rossmann fold. 

Consensus pattem: P-x(0,2)-[GSTAN]-[DENQGAPK]-x-[LIVMFP]-[HT)-[LIVMYACl-G-[HNTG]-[LIVMFYSTAGPC 
[ 1] Schimmel P Annu. Rev Biochem. 56:125-158(1987).[ 2] Webster T. Tsai H., Kula M.. Mackie G.A., Schimmel P 
Science 226:1315-1317(1984).[ 3] Brick P. Bhat T.N.. Blow D.M. J. Mol. Biol. 208:83-98(1988) [ 4J Delame M Moras 
D. BioEssays 15:675-687(1993).[5] Schimmel P Trends Biochem. Sci. 16:1 -3(1 991 ).[ 61 Nagel G M Doolittle R F 
Proc.Natl.Acad. Sci. U.S.A. 88:8121-8125(1991). ' ' 
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[1555J 665. Aminoacyl-transfer RNA synthetases ctass-ll signatures (tRNA synt 2b) 

Aminoacyl-tRNA synthetases (EC 6.1.1.-) [1] are a group of enzymes which activate amino acids and transfer them 
to specific tRNA nnolecules as the first step in protein biosynthesis. In prokaryotic organisms there are at least twenty 
different types of aminoacyl-tRNA synthetases, one for each different amino acid. In eukaryotes there are generally 
two aminoacyl-tRNA synthetases for each different amino acid: one cytosolic form and a mitochondrial form. While all 
these enzymes have a common function, they are widely diverse informs of subunit size and of quaternary structure. 
The synthetases specific for alanine, asparagine, asparticacid, glycine, histidine. lysine, phenylalanine, proline, serine, 
and threonine are referred to as class-ll synthetases [2 to 6] and probably have a common folding pattern' in their 
catalytic domain for the binding of ATP and amino acid which is different to the Rossmann fold observed for the class 
I synthetases [TJ.CIass-ll tRNA synthetases do not share a high degree of similarity, however at least three conserved 
regions are present [2,5,8]. Signature patterns have been derived from two of these regions. 
Consensus pattern: [FYH]-R-x-[DE]-x(4,12)-[RH]-x(3)-F-x(3)-[DE 

Consensus pattern: [GSTALVF].{DENQHRKP}-[GSTAHLIVMF]-[DE]-R-[LIVMF]-x-[LIVMSTAG]-[LIVMFY] 
[ 1] Schimmel P. Annu. Rev Biochem. 56:125-158(1987).[2] Delarue M.. Moras D. BioEssays 15:675-687(1993) [ 3] 
Schimmel P Trends Biochem. Sci. 16:1-3(1991).[4] Nagel G.M., Doolittle R.F Proc. Natl. Acad. Sci. U.S.A. 88: 
8121-8125(1991). [ 5] Cusack S., Haertlein M., Leberman R. Nucleic Acids Res. 19:3489-3498(1 991). [6] Cusack S 
Biochimie 75:1077-1081(1993).[7] Cusack S.. Berthet-Colominas C. Haertlein M.. Nassar N.. Lebemian R. Nature 
347:249-255(1 990).[ 8] Leveque R. Plateau P, Dessen P, Blanquet S. Nucleic Acids Res. 18:305-312(1990). 
[1556] 666. Thaumatin family signature 

Thaumatin [1] is an intensively sweet-tasting protein (100 000 times sweeter than sucrose on a molar basis) from 
Thaumatococcus daniellii. an African brush. The protein is made of about 200 residues and contains 8 disulfide bonds. 
A number of proteins have been found to be related to thaumatins. These protein are listed below (references are only 
provided for recently determined sequences). - A maize alpha-amylase/trypsin inhibitor. - Two tobacco pathogenesis- 
related proteins: PR-R major and minor fornns. which are induced after infection with viruses. - Salt-induced protein 
NP24 from tomato. - Osmotin, a salt-induced protein from tobacco. - Osmotin-like proteins OSr^L13, 0SML15 and 
OSML81 from potato [2]. - P21, a leaf protein from soybean. - PWIR2, a leaf protein from wheat. - Ze^atin, a maize 
antifunal protein [3].The exact biological function of all these proteins is not yet known. A conserved region that includes 
three cysteine residues known (in thaumatin) to be involved in disulfide bonds has been selected as a signature pattern. 
+ ^ II ******* III 

xxCxxxxxxxxxxxxxxxxCxxCxxCxCxxxxxxxxxxxxxxCxxCxCxxxCxCxxCCxCxxxCxxxxx CxxxCxIlllllllllll +-h- +-+ I +~+ 

+--++-+ I + +'C': conserved cysteine involved in a disulfide bond.'**: position of the pattern 

Consensus pattern: G-x-[GF]-x-C-x-T-[GA]-D-C-x(1.2)-G-x(2,3)-C 

[ 1] Edens L, Heslinga L. Klok R., Ledeboer A.M., Maat J.. Toonen M.Y.. Visser C, Verrips C.T Gene 181-12(1 982) 
[ 2] Zhu B.. Chen TH.H., Li PH. Plant Physiol. 108:929-937(1 995). ( 3] Malehom D.E.. Borgmeyer J R Smith C E 
Shah D.M.; Plant Physiol. 106:1471-1481(1994). 
[1557] 667. Thiolases signatures 

Two different types of thiolase [1,2,3) are found both in eukaryotes and in prokaryotes: acetoacetyKJoA thiolase (EC 
gMM and 3-ketoacyl-CoA thiolase(EC 2.3.1.16 ). 3-ketoacyl-CoA thiolase (also called thiolase I) has a broad chain- 
length specificity for its substrates and is involved in degradative pathways such as fatty acid betaK)xidation. Ace- 
toacetyl-CoA thiolase (also called thiolase II ) is specific for the thiolysis of acetoacetyl-CoA and involved in biosynthetic 
pathways such as poly beta-hydroxybutyrate synthesisor steroid biogenesis. In eukaryotes, there are two forms of 
3-ketoacyl-CoA thiolase: one located in the mitochondrion and the other in peroxisomes. There are two conserved 
cysteine residues important for thiolase activity The first located in the N-terminal section of the enzymes is involved 
in the fomiation of an acyl-enzyme intermediate; the second located at the C-terminal extremity is the active site base 
involved in deprotonation in the condensation reaction. Mammalian nonspecific lipid-transfer protein (nsL-TP) (also 
known as sterol carrier protein 2) is a protein which seems to exist in two different forms: a 14 Kd protein (SCP-2) and 
a larger 58 Kd protein (SCP-x). The former is found in the cytoplasm or the mitochondria and is involved in lipid transport; 
the latter is found in peroxisomes. 

The C-terminal part of SCP-x is identical to SCP-2 while the N-terminal portion is evolutionary related to thioiases[4]. 
Three signature patterns have been developed for this family of proteins, two of which are based on the regions around 
the bblogbally important cysteines. The third is based on a highly conserved region in the C-terminal part of these 
proteins. 

Consensus pattem: [LIVM]-[NST]-x(2)-C-[SAGU]-[ST]-[SAG]-[LI VMFYNS]-x-[STAG]-[LI VM]-x(6)-[LI VM] [C is involved 
in formation of acyl-enzyme intermediate] 

Consensus pattern: N-x(2)-G-G-x-[LIVM]-[SA]-x-G-H-P-x-[GAl-x-[ST]-G 

Consensus pattem: [AGJ-fUVMAJ-fSTAGCLI VM]-[STAG].[LIVMA].C-x.(AG].x-[AG].x- [AG].x-[SAG] [C is the active site 
residue] 

[ 1] Peoples O.R. Sinskey A.J. J. Bbl. Chem. 264:15293-15297(1 989).[ 2] Yang S.-Y, Yang X-YH., Healy-Louie G.. 
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Sdiuiz H.. Elzinga M. J. BbL Chem. 265: 10424-1 0429(1 990).[ 3] Igual J.C., Gonzalez-Bosch C. Dc^azo J , Perez- 
Ortin J.E. J. Mol. EvoL 35: 147-1 55(1 992).[ 4] Baker M.E., Billheimer J.T. Strauss J.R III DNA Cell Biol 10'695-698 
(1991). 

[1558] 668. Thioredoxin family active site 

Thioredoxins [1 to 4] are small proteins of approximately one hundred amino-acid residues which participate in various 
redox reactions via the reversible oxidation of an active center disulfide bond. They exist in either a reduced form or 
an oxidized form where the two cysteine residues are linked in an intramolecular disulfide bond. Thioredoxin is present 
in prokaryotes and eukaryotes and the sequence around the redox-active disulfide bond is wellconsen/ed. Bacteri- 
ophage T4 also encodes for a thioredoxin but its primary structure is not homologous to bacterial, plant and vertebrate 
thioredoxins. A number of eukaryotic proteins contain domains evolutbnary related tothraredoxin, all of them seem to 
be protein disulphide isomerases (PDI). PDI(EC 5.3.4.1 ) [5,6,7] is an endoplasmic reticulum enzyme that catalyzes 
the rearrangement of disulfide bonds in various proteins. The various forms of PDI which are currently known are: - 
PDI major isozyme; a multifunctional protein that also function as the beta subunit of prolyl 4-hydroxylase (EC 
UiJU), as a component of oligosaccharyl transferase (EC 2.4.1.119 ). as thyroxine deiodinase (EC 3.8. 1.4). as 
glutathione-insulin transhydrogenase (EC 1.8.4.2) and as a thyroid hormone-binding protein I - ERp60 (ER-60; 58 Kd 
microsomal protein). ERpSO was originally thought to be a phosphoinositide-specific phospholipase C isozyme and 
later to be a protease. - ERp72. - P5. All PDI contains two or three (ERp72) copies of the thioredoxin domain. Bacterial 
proteins that act as thiol:disulfide interchange proteins thatallows disulfide bond formatbn in some periplasmic proteins 
also contain a thbredoxin domain. These proteins are: - Escherichia coli dsbA (or prfA) and its orthologs in Vibrb 
cholerae (tcpG) and Haemophilus influenzae (por). - Escherichia coli dsbC (or xpRA) and its orthologs in Erwinia 
chrysanthemi and Haemophilus influenzae. - Escherichia coli dsbD (or dipZ) and its Haemophilus influenzae ortholog. 
- Escherichia coli dsbE (or ccmG) and orthologs in Haemophilus influenzae, Rhodobacter capsulatus (helX). Rhizio- 
biacae (cycY and tlpA). Consensus pattem: [LI VMF].[LI VMSTA]-x-[LI VMFYCl-[FYWSTHE]-x(2)- [FYWGTNl-C- [GAT- 
PLVE]-[PHYWSTA]-C-x(6)-[LI VMFYm] [The two C's form the redox-active bond] 

[ 1] Holmgren A. Annu. Rev Biochem. 54:237-271 (1985).[ 2] Gleason FK.. Holmgren A. FEMS MicrobbI Rev 54* 
271-297(1988).[3] Holmgren A. J. BioL Chem. 264:1 3963-1 3966(1 989). [ 4] Ekiund H., Gleason RK.. Holmgren A 
Proteins 11:1 3-28(1 991 ).[ 5] FreedmanR.B., Hawkins H.C., MurantS.J., ReidL Biochem. Soc. Trans. 16:96-99(1988) 
[ 6] Kivlrikko K.I.. Myllyla R., Pihiajaniemi T FASEB J. 3:1609-1617(1989).[ 7] Freedman R.B., Hirst TR . Tuite M R 
Trends Biochem. Sci. 19:331-336(1994). 

[1559] 669. (Transcript fac2) Transcription factor TFIIB repeat signature 

In eukaryotes the initiation of transcription of protein encoding genes by polymerase II is modulated by general and 
specific transcription factors. The general transcription factors operate through common promoters elements (such as 
the TATA box). At least seven different proteins associates to fomn the general transcription factors: TFIIA, -MB, -IID, - 
HE, -IIR -IIG, and -II H[1]. Transcript ion factor IIB (TFIIB) plays a central role in the transcription of class'll genes it 
associates with a complex of TFIID-IIA bound to DNA (DA complex) to form a ternary complex TFIID-IIA-IBB (DAB 
complex) which is then recognized by RNA polymerase II [2.3]. TFIIB is a protein of about 315 to 340amino ackJ 
residues which contains, in its C-terminal part an imperfect repeat of a domain of about 75 residues. This repeat could 
contribute an element of symmetry to the folded protein. The following proteins have been shown to be evolutionary 
related to TFIIB: - An archaebacterial TFIIB homolog. In Pyrococcus woesei a previously undetected open reading 
frame has been shown [4] to be highly related to TFIIB. - Fungal transcription factor NIB 70 Kd subunit (gene 
PCF4yTDS4/BRF 1 ) [5]. This protein is a general activator of RNA polymerase III transcription and plays a role analogous 
to that of TFIIB in pol III transcription. The central section of the repeated domain, whbh is the most conserved part of 
that domain has been selected as a signature pattem. 

Consensus pattem: G-[KR]-x(3). [STAGN].x-[LIVMYA]-[GSTAJ(2).[CSA\n-[LIVM]-[LIVMFY]-[LIVMA]-(GSA]-[STAC 
[ 1] Weinmann R. Gene Expr. 2:81-91(1992).[ 2] Hawley D. Trends Biochem. Sci. 16:317-318(1991) [ 3] Ha I Lane 
W.S.. Reinberg D. Nature 352:689-695(1 991 ).[ 4] Ouzounis C. Sander C. Cell 71: 189-1 90M 992V f 5] Khoo B . Brophy 
B., Jackson S.R Genes Dev 8:2879-2890(1 994), 
[1560] 670. (transcritp fact) MADS-box domain signature and profile 

A number of transcription factors contain a conserved domain of 56 amino^cid residues, sometimes known as the 
MADS-box domain [El]. They are listed below: - Serum response factor (SRF) [1], a mammalian transcription factor 
that binds to the Serum Response Element (SRE). This is a short sequence of dyad symmetry located 3<K) bp to the 
5'end of the transcriptbn initiation site of genes such as c^os. - Mammalian myocyte-specific enhancer factors 2A to 
2D (MEF2A to MEF2D). These proteins are transcription factor which binds specifically to the MEF2 element present 
in the regulatory regions of many muscle-specific genes. - Drosophila myocyte-specific enhancer factor 2 (MEF2). - 
Yeast GRM/PRTF protein (gene MCM1) [2], a transcriptional regulator of mating-type-specific genes - Yeast arginine 
metabolism regulatbn protein I (gene ARGR1 or ARG80). - Yeast transcription factor RLM1 . - Yeast transcriptbn factor 
SMP1. - Arabidopsis thaliana agamous protein (AG) [3]. a probable transcription factor involved in regulating genes 
that determines stamen and carpel development in wild-type flowers. Mutatbns in the AG gene result in the replacement 
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of the stamens by petals and the carpels by a new flower. - ArabkJopsis thaliana homeotic proteins Apetalal (API). 
ApetalaS (APS) and Pistillata (PI) which act locally to specify the identity of the floral meristem and to determine sepal 
and petal development [4]. - Antirrhinum majus and tobacco homeotic protein deficiens (DEFA) and globosa (GLO) 
[5]. Both proteins are transcription factors involved in the genetic control of flower devetopment. Mutations In DEFA or 
GLO cause the transformation of petals into sepals and of stamina into carpels. - Arabidopsis thaliana putative tran- 
scription factors AGL1 to AGL6 [6). - Antirrhinum majus morphogenetic protein DEF H33 (squamosa).ln SRF, the 
conserved domain has been shown [1] to be involved in DNA-binding and dimerization. A pattern that spans the com- 
plete length of the domain has been derived. The profile also spans the length of the MADS-box. 
Consensus pattem: R-x-[RK]-x(5)-l-x-(DNGSK].x(3)-[KR]-x(2)-T.[FY]-x-[RK](3)- x(2)-[LIVM]-x-K(2)-A-x-E-[LIVM]- 
[STA]-x-L-x(4)-[LIVM]-x- [UVM](3)-x(6)-[LIVMF]-x(2)-[FY] 

[ 1] Norman C, Runswick M.. Pollock R., Treisman R. Cell 55:989-1003(1 988) i 2] Passmore S., Maine G.T. Elble R.. 
Christ C, Tye B.-K. J. Mol. Biol. 204:593-606(1 988). [ 3] Yanofsky M., Ma H.. Bowman J., Drews G., Feldrr^nn K.A,! 
Meyerowrtz E.M. Nature 346:35-39(1 990).[ 4] Goto K.. Meyerowitz E.M. Genes Dev 8: 1548-1 560(1 994>.[ 5] Troebner 
W.. Ramirez L, Motte P, Hue I., Huijser P, Loennig W.-E., Saedler H„ Sommer H., Schwartz-Sommer 2. EMBO J, 
11:4693-4704(1992).[ 6] Ma H., Yanofsky M.R, Meyerowitz E.M. Genes Dev 5:484-495(1 991 ).(E1] 
[1561] 671 . Transketolase signatures 

Transketolase (EC 2.2.1.1 ) (TK) catalyzes the reversible transfer of a two-carbon ketol unit from xylulose 5-phosphate 
to an aldose receptor, such as ribose 5-phosphate, to form sedoheptulose 7-phosphate and glyceraldehyde 3-phos- 
phate. This enzyme, together with transatdolase. provides a link between the glycolytic and pentose-phosphate path- 
ways. TK requires thiamin pyrophosphate as a cofactor. In most sources where TK has been purified, rt is a homodimer 
of approximately 70 Kd subunits. TK sequences from a variety of eukaryotic and prokaryotic sources [1 ,2] show that 
the enzyme has been evolutionarily consented. In the peroxisomes of methylotrophic yeast Hansenula polymorpha. 
there is a highly related enzyme, dihydroxy-acetone synthase (DMAS) (EC 2.2.1.3 ) (also known as formaldehyde tran- 
sketolase), which exhibits a very unusual specificity by including formaldehyde amongst its substrates. 1-deoxyxylu- 
lose-5-phosphate synthase (DXP synthase) [3] is an enzyme so far found in bacteria (gene dxs) and plants (gene 
CLA1) which catalyzes the thiamin pyrophosphoate-dependent acyloin condensation reaction between carbon atoms 
2 and 3 of pyruvate and glyceraldehyde 3-phosphate to yield 1-deoxy-D- xylulose-5-phosphate (dxp), a precursor in 
the biosynthetic pathway to isoprenoids, thiamin (vitamin B1 ), and pyridoxol (vitamin B6). DXP synthase is evolutionary 
related to TK. Two regions of TK have been selected as signature patterns. The first, located in the N-terminal section, 
contains a histidine residue which appears to function inproton transfer during catalysis [4]. The second, located in the 
central section, contains consented acidic residues that are part of the active cleft and may participate in substrate- 
binding [4]. 

Consensus pattern: R-x(3)-[LIVMTAHDENQSTHKF]-x(5,6)-[GSN]-G-H-[PLIVMF]-[GSTA]-x(2)-[LIMC]-[GS 
Consensus pattem: G.[DEQGSA]-[DN]-G-[PAEQ]-[ST]-[HQ]-x-tPAGM]-[LIVMYAC]-[DEFYW]-x(2)-[STAP]-x(2)-[RGA] 
[ 1] Abedinia M., Layfield R.. Jones S.M., Nixon PR. Mattick J.S. Biochem. Biophys. Res. Commua 183:1159-1166 
(1992).[2] Fletcher TS„ Kwee I.L, Nakada T. Largman C, Martin B.M. Biochemistry 31 :1 892-1 896(1 992).[ 3] 
Sprenger G.A., Schorken U.. Wiegert T, Grolle S., De Graaf A.A.. Taylor S.V.. Begley TP, Bringer-Meyer S., Sahm 
H. Proc. Natl. Acad. Sci. U.S. A. 94: 12857-1 2862(1 997V [41 Lindqvist Y, Schneider G., Ermler U., Sundstroem M EMBO 
J. 11:2373-2379(1992). 

[1562] 672. Transmembrane 4 family signature 

Recently a number of eukaryotic cell surface antigens have been found to be evolutionary related [1 .2,3]. The proteins 
known to belong tothis family are listed below: - Mammalian antigen CD9 (MIC3); A protein involved in platelet activation 
and aggregation. - Mammalian leukocyte antigen CD37, expressed on B lymphocytes. - Mammalian leukocyte antigen 
CD53 (OX-44), which may be involved in growth regulation in hematopoietic cells. - Mammalian lysosomal membrane 
protein CD63 (melanoma-associated antigen ME491: antigen AD1). - Mammalian antigen CD81 (cell surface protein 
TAPA-1), which may play an important role in the regulation of lymphoma cell growth. - Mammalian antigen CD82 
(protein R2; antigen C33; Kangai 1 (KAI1)), which associates with CD4 or CDS and delivers costimulatory signals for 
the TCR/CD3 pathway. - Mammalian antigen CD151 (SFA-1; platelet-endothelial tetraspan antigen 3 (PETA-3)). - 
Mammalian cell surface glycoprotein A15 (TALLA-1; MXS1). - Mammalian novel antigen 2 (NAG-2). - Human tumor- 
associated antigen CO029. - Schistosoma mansoni and japonicum 23 Kd surface antigen (SM23 / SJ23).These pro- 
teins share the following characteristics: they all seem to be type 111 membrane proteins (type III proteins are integral 
membrane proteins that contain a N-terminal membrane-anchoring domain which is not cleaved during bbsynthesis 
and which functions both as a translocatbn signal and as a membrane anchor); they also contain three additional 
transmembrane regions, at least seven consented cysteines residues, and are of approximately the same size (218 
to 284 residues). These proteins are collectively know as the transmembrane 4 super family' (TM4) because they span 
the plasma membrane four times. A schematic diagram of the domain structure of these proteins isshown below. +- 

+ + +— + II TMa I Extra I TM2I Cyt I TM3 I Extracellular I TM4 I 

Cytl +.H + — ^ cc C— "C- -+ — -C — + ** cyt cytoplasmic domain. TMa : 
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transmembrane anchor. TM2 to TM4: transmembrane regions 2 to 4.'C*: conserved cysteine. : position of the pattern. 
A conserved region that includes two cysteines and seems to be located in a short cytoplasmic loop between two 
transmembrane domains has been selected as a signature for these proteins. 
Consensus pattern: G-x(3HLIVMF]-x(2HGSAHLlVMn(2)-G-C-x-[GA]-[STA]- x(2HE 

[ 1] Levy S.. Nguyen V.Q.. Andria M.L.. Takahashi S. J. Biol. Chem. 266:14597-14602(1991 ).[ 2] Tomlinson M G . Wil- 
liams A.F.. Wright M.D. Eur. J. Immunol. 23:136-40(1993).[ 3] Barclay A.N,, Birkeland M.L, Brown M.H.. Beyers A,D.. 
Davis S.J., Somoza C. Williams A,F. The leucocyte antigen factbooks. Academic Press, London / San Diego, (1993). 
[1563] 673. Tryptophan synthase alpha chain signature 

Tryptophan synthase catalyzes the last step in the biosynthesis of tryptophan: the conversion of Indoleglycerol phos- 
phate and serine, totryptophan and glyceraldehyde 3-phosphate [1 ,2]. It has two functional domains: one for the aldol 
cleavage of indoleglycerol phosphate to indole andglyceraldehyde 3-phosphate and the other for the synthesis of 
tryptophan fromindole and serine. In bacteria and plants [3], each domain is found on a separate subunit (alpha and 
beta chains), while in fungi the two domains are fused together on a single multifunctional protein. A conserved region 
that contains three consented acidic residues has been selected as a signature pattern for the alpha chain. The first 
and the third acidic residues are believed to serve as proton donors/acceptors in the enzyme's catalytic mechanism 
Consensus pattem: [LIVM].E-[LI VM]-G-x(2)-[FYC]-[ST]-[DEHPA]-[LI VMY]- [AGLI]-[DE]-G 

[ 1] Crawford I.R Annu. Rev. Microbiol. 43:567-600(1 989). [ 2] Hyde C.C., Miles E.W. Biotechnology 8:27-32(1990) 
[ 3] Berlyn M.B., Last R.L, Fink G.R. Proc. Natl. Acad. Scl. U.S.A. 86:4604-4608(1989). 
[1564] 674. Tryptophan synthase beta chain pyrldoxal-phosphate attachment site 

Tryptophan synthase catalyzes the last step in the biosynthesis of tryptophan: the conversion of indoleglycerol phos- 
phate and serine, totryptophan and glyceraldehyde 3.phosphate [1 ,2]. It has two functional domains: one for the aldol 
cleavage of indoleglycerol phosphate to Indole andglyceraldehyde 3-phosphate and the other for the synthesis of 
tryptophan fromindole and serine. In bacteria and plants [3], each domain is found on a separate subunit (alpha and 
beta chains), while in fungi the two domains arefused together on a single multifunctional protein. The beta chain of 
the enzyme requires pyrldoxal-phosphate as a cofactor. The pyrldoxal-phosphate group is attached to a lysine residue. 
The region around this lysine residue also contains two hlstldlne residues which are part of the pyrldoxal-phosphate 
binding site. The signature pattem for the tryptophansynthase beta chain is derived from that conserved region. 

- Consensus pattem: [LI VM]-x-H-x-G-[STA]-H-K-x-N [K is the pyridoxal-P attachment site] 

[ 1] Crawford LP Annu. Rev Microbiol. 43:567-600(1 989). [ 2] Hyde C.C., Miles E.W. BioTTechnology 8:27-32(1990) 
[ 3] Berlyn M.B., Last R.L., Fink G.R. Proc. Natl. Acad. Sci. U.S.A. 86:4604-4608(1989). 
[1565] 675. Serine proteases, trypsin family, active sites 

The catalytic activity of the serine proteases from the trypsin family is provided by a charge relay system involving an 
aspartlc acid residue hydrogen-bonded to a histidine, which itself is hydrogen -bonded to a serine. The sequences in 
the vicinity of the active site serine and histidine residues are well conserved in this family of proteases [1]. A partial 
list of proteases known to belong to the trypsin family Is shown below. - Acrosin. - Blood coagulation factors VII. IX, X, 
XI and XII, thrombin, plasminogen, and protein C. - Cathepsin G. - Chymotrypslns. - Complement components CU, 
C1s, C2. and complement factors B, D and I. - Complement-activating component of RA-reactive factor. - Cytotoxic 
cell proteases (granzymes A to H). - Duodenase I. - Elastases 1. 2, 3A, 3B (protease E). leukocyte (medullasln). - 
Enterokinase (EC 3.421.9) (enteropeptidase). - Hepatocyte growth factor activator. - Hepsin. - Glandular (tissue) ka- 
Ihkreins (including EGF-bindlng protein types A, B. and C, NGF-gamma chain, gamma-renin, prostate specific antigen 
(PSA) and tonin). - Plasma kallikrein. - Mast cell proteases (MCP) 1 (chymase) to 8. - Myeloblasts (proteinase 3) 
(Wegener's autoantigen). - Plasminogen activators (urokinase-type. and tissue-type). - Trypsins 1. 11,111, and IV - Tryp- 
tases. - Snake venom proteases such as ancrod. batroxobin, cerastobin, flavoxobin. and protein C activator. - Colla- 
genase from common cattle grub and collagenolytlc protease from Atlantic sand fiddler crab. - Apollpoprotein(a) - 
Blood fluke cercarial protease. - Drosophila trypsin like proteases: alpha, easter, snake-locus. - Drosophlla protease 
stubble (gene sb). - Major mite fecal allergen Der p III. All the above proteins belong to family SI in the classification 
of peptldases[2.E1J and originate from eukaryotic species. It should be noted thatbacterial proteases that belong to 
family S2A are similar enough in the regions of the active site residues that they can be picked up by the same patterns 
These proteases are listed below - Achromobacter lyticus protease L - Lysobacter alpha-lytic protease. - Streptogrisin 
A and B (Streptomyces proteases A and B). - Streptomyces griseus glutamyl endopeptidase II. - Streptomyces fradiae 
proteases 1 and 2. 

Consensus pattem: [LIVM]-[ST].A-[STAG]-H-C [H is the active site residue] 

Consensus pattem: IDNSTAGC]-[GSTAPIMVQH]-x(2)-G.[DE].S-G.[GS]-[SAPHVJ-[LIVMFYWH]-[LI\^FYSTANQHl 
[S is the active site residue] 

f 1] Brenner S. Nature 334:528-530(1 988). [ 2] Rawllngs N.D.. Barrett A.J. Meth. Enzymol. 244:19-61(1994) [El] 
[1566] 676, (tsp) Thrombospondin type 1 domain 
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[1567] [1] Bork P; FEBS lett 1993;327:125-130. 

[1568] 677. TubuHn subunits alpha, beta, and gamma signature 

Tubulins [1 .2], the major constituent ot microtubules are dimeric proteins which consist of two closely related subunrts 
(alpha and beta). Tubulin binds two molecules of GTP at two different sites (N and E). At the E (Exchangeable) site. 
GTP Is hydrolyzed during incorporation into the microtubule. Near the E site is an invariant region rich in glycines which 
is found in both chains andwhich is now [3] said to control the access of the nucleotide to its binding site. A signature 
pattern was developed from this region. With the exception of the simple eukaryotes, most species express a variety 
of closely related alpha and beta isotypes. In most species there is a third member of the tubulin family: gamma tubulin 
Gamma tubulin is found at microtubule organizing centers (MTOC) such as the spindle poles or the centrosome, sug- 
gesting that it is involved in the minus-end nucleation of microtubule assembly [4]. 
Consensus pattern: [SAG]-G-G-T-G-[SA]-G 

[ 1] Cleveland D.W.. Sullivan K.F. Annu. Rev Biochem. 54:331 -365(1 985). [ 2] Joshi H.C.. Cleveland D.W. Cell Motil 
Cytoskeleton 16:159-163(1990).[ 3] Hesse J., Thierauf M.. PonstingI H. J. Biol. Chem. 2621 5472-1 5475(1 987) [41 
Joshi H.C. BioEssays 15:637-643(1993). 
[1 569] Tubulin-beta mRN A autoregulation signal 

The stability of beta-tubulin mRNAs are autoregulated by their own translation product [1 ]. Unpolymerized tubulin sub- 
units bind directly (or activate a factor(s) which binds co-translationally) to the nascent N-terminus of beta-tubulin. This 
binding is transduced through the adjacent ribosomes to activatean RNAse that degrades the polysome-bound mRNA. 
The recognition element has been shown to be the first four amino acids of beta-tubulin: Met-Arg-Glu-lle. Mutations 
to this sequence abolish the autoregulation effect (except for the replacement of Glu by Asp); transposition of this 
sequence to an internal region of a polypeptide also suppresses the autoregulatory effect. 
Consensus pattern: <M-R-[DE]-[IL] 

[ 1] Cleveland D.W. Trends Biochem. Sci. 1 3:339-343(1 988>. 

[1570] 678. (tRNA-synt 2c) Aminoacyl-transfer RNA synthetases class-ll signatures. Aminoacyl-tRNA synthetases 
(EC 6.1.1.-) [1] are a group of enzymes which activate amino acids and transfer them to specific tRNA molecules as 
the first step in protein biosynthesis. In prokaryotic organisms there are at least twenty different types of aminoacyl- 
tRNA synthetases, one for each different amino acid. In eukaryotes there are generally two aminoacyKRNA synthetas- 
es for each different amino acid: one cytosolic fomi and a mitochondrial form. While all these enzymes have a common 
function, they are widely diverse in terms of subunit size and of quaternary structure. The synthetases specific for 
alanine, asparagine. aspartic acid, glycine, histidine, lysine, phenylalanine, proline, serine, and threonine are referred 
to as class-ll synthetases [2 to 6] and probably have a common folding pattern in their catalytic domain for the binding 
of ATP and amino acid which is different to the Rossmann fold observed for the class I synthetases [7].CIass-ll tRNA 
synthetases do not share a high degree of similarity, however at least three conserved regions are present [2.5,8]. 
Signature patterns have been derived from two of these regions. 
Consensus pattern: [FYH]-R-x-[DE]-x(4,12)-[RH]-x(3)-F-x(3)-[DE]- 

Consensus pattem: [GSTALVn-{DENQHRKP}-[GSTAHLIVMFHDE]-R-[LIVMF]-x-[LIVMSTAGHLIVMFY]- 
[1571] [ 1] Schimmel P Annu. Rev Biochem. 56: 125-1 58(1 987). [2] Delarue M., Moras D. BioEssays 15:675-687 
(1993).[ 3] Schimmel P Trends Biochem. Sci. 16: 1-3(1 991 ).[ 4] Nagel G.M., Doolittle R.F Proc. Natl. Acad. Sci. U.S. 
A. 88:8121-8125(1991). [5] CusackS., Haertlein M., Leberman R. Nucleic Acids Res. 19:3489-3498(1 991 ).[ 6]Cusack 
S. Biochimie 75: 1077-1 081 (1993).[ 7] Cusack S., Berthet-Colominas C. Haertlein M.. Nassar N.. Leberman R. Nature 
347:249-255(1 990).[ 8] Leveque F. Plateau P. Dessen P. Blanquet S. Nucleic Acids Res 18-305-312(1990) 
[1572] 679. UBA-domain 

[1573] The UBA-domain (ubiquitin associated domain) is a novel sequence motif found in several proteins having 
connections to ubiquitin and the ubiquitination pathway. The structure of the UBA domain consists of a compact three 
helix bundle [1]. Number of members: 84 

[1574] [1 1 Structure of a human DNA repair protein UBA domain that interacts with HI V-1 Vpr. Dieckmann T, Withers- 
Ward ES, Jarosinski MA. Liu CF. Chen IS. Feigon J; Nat Struct Biol 1998;5:1042-1047 
[1575] 680. UBX domain 

Domain present in ubiqurtin-regulatory proteins. Present in FAF1 and Shplp.Number of members: 19 

[1] The UBA domain: a sequence motif present in multiple enzyme classes of the ubiquitination pathway. Hofmann K 

Bucher P; Trends Biochem Sci 1996;21:172-173- 

[1576] 681 . (UCH) Ubiquitin carboxyl-terminal hydrolases family 1 cysteine active site 

Ubiquitin carboxyl-terminal hydrolases (UCH) (deubiquitinating enzymes) [1.2] are thiol proteases that recognize and 
hydrolyze the peptide bond at the C-terminal glycine of ubiquitin. These enzymes are involved in the processing of 
poly-ubiquitin precursors as well as that of ubiquinated proteins. There are two distinct families of UCH. The first class 
consist of enzymes ofabout 25 Kd and is currently represented by: - Mammalian isozymes LI and L3. - Yeast YUH1 
- Drosophila Uch.One of the active site residues of class-l UCH [3] is a cysteine. A signature pattem has been derived 
from the regbn around that residue. Consensus pattem: Q-x(3)-N-{SA]-C-G-x(3)-[LIVM](2)-H-[SAHLIVM]-[SAl (C is 
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the active site residue 

[ 1] Jentsch S„ Seutert W., Hauser H.-R Biochim. Biophys. Acta 1089: 127-1 39(1 991 ).[ 2] D'andrea A.. Pellman D. Crit. 
Rev. Biochem. Mol. Bbl. 33:337-352(1 998). ( 3] Jc^nston S.C. Larsen C.N.. Cook W. J., Wilkinson K.D., Hill CP. EMBO 
J. 16:3787-3796(1 997). [4] Rawlings N.D.. Barrett A.J. Meth. Enzymol. 244:461-486(1994). 

s [1577] 682. Ubiquitin carboxyl-terminal hydrolases family 2 signatures (UCH-1 ) 

Ubiquitin carboxyl-terminal hydrolases (UCH) (deubiqultinating enzymes) [1 .2] are thiol proteases that recognize and 
hydrolyze the peptide bond at the C-terminal glycine of ubiquitin. These enzymes are involved in the processing of 
poly-ubiquitin precursors as well as that of ubiquinated proteins. There are two distinct families of UCH. The second 
class consist of largeproteins (800 to 2000 residues) and is currently represented by: - Yeast UBP1, UBP2. UBP3, 

10 UBP4 (or DOA4/SSW), UBP5, UBP7, UBP9, UBP10, UBP11, UBP12. UBP13, UBP14, UBP15 and UBP16. - Humari 
tre-2. - Human isopeptidase T. - Human isopeptidase T-3. - Mammalian Ode-1 . - Mammalian Unp. - Mouse Dub-1 . - 
Drosophlla fat facets protein (gene faf). - Mammalian faf homolog. - Drosophila D-Ubp-64E. - Caenorhabditis elegans 
hypothetical protein R1 0E11 .3. - Caenorhabditis elegans hypothetical protein K02C4.3. These proteins only share two 
regions of similarity. The first region containsa conserved cysteine which is probably implicated in the catalytic mech- 

is anism. The second region contains two conserved histidines residues, one of which is also probably implicated in the 
catalytic mechanism. Signature patterns for both conserved regions have been developed. 

Consensus pattern: G-[LIVMFY]-x(1,3)-[AGC]-[NASM]-x-C-[FYW]-[LIVMC]-[NST]-[SACV]-x-[LIVMS]-Q[C is the puta- 
tive active site residue] 

Consensus pattern: Y-x-L-x-[SAG]-[LIVMFT]-x(2)-H-x-G-x(4,5)-G-H-Y [The two H's are putative active site residues] 
20 [ 1] Jentsch S., Seufert W., Hauser H.-P. Biochim. Biophys. Acta 1089: 127-1 39(1 991 ).[ 2] D'andrea A.. Pellman D. Grit. 
Rev. Biochem. Mol. Biol. 33:337-352(1 998).[ 3] Rawlings N.D.. Barrett A.J. Meth. Enzymol. 244:461-486(1994). 
[1578] 683. Ubiquitin carboxyl-terminal hydrolases family 2 signatures (UCH-2) 

Ubiquitin carboxyl-terminal hydrolases (UCH) (deubiquitinating enzymes) [1,2] are thiol proteases that recognize and 
hydrolyze the peptide bond at the C-terminal glycine of ubiquitin. These enzymes are involved in the processing of 

25 poly-ubiquitin precursors as well as that of ubiquinated proteins. There are two distinct families of UCH. The second 
class consist of largeproteins (800 to 2000 residues) and is currently represented by: - Yeast UBP1. UBP2, UBP3. 
UBP4 (or DOA4/SSV7). UBP5. UBP7. UBP9, UBP10. UBP11, UBP12. UBP13. UBP14. UBP15 and UBP16. - Human 
tre-2. - Human isopeptidase T. - Human isopeptidase T-3. - Mammalian Ode-1. - Mammalian Unp. - Mouse Dub-1. - 
Drosophila fat facets protein (gene faf). - Mammalian faf homolog. - Drosophila D-Ubp-64E. - Caenortiabditis elegans 

30 hypothetical protein R1 0E11 .3. - Caenorhabditis elegans hypothetical protein K02C4.3. These proteins only share two 
regions of similarity. The first region containsa conserved cysteine which is probably implicated in the catalytic mech- 
anism. The second region contains two conserved histidines residues, one of which is also probably implicated in the 
catalytic mechanism. Signature patterns for both conserved regbns have been developed. 

Consensus pattern: G-[LI VMFY]-x(1 .3)-[AGC]-[NASM]-x-C-[FYW]-[UVMC]-[NST]-[SACV]-x-[LIVMS]-Q [C is the puta- 

35 tive active site residue] 

Consensus pattern: Y-x-L-x-[SAG]-[LIVMFT]-x(2)-H-x-G-x(4,5)-G-H-Y [The two H's are putative active site residues] 
[ 1] Jentsch S„ Seufert W., Hauser H.-P Biochim. Biophys. Acta 1089:1 27-1 39(1 991 ).[ 2] D'andrea A., Pellman D. Crit. 
Rev. Biochem. Mol. Biol. 33:337-352(1 998).[ 3] Rawlings N.D., Barrett A.J. Meth. Enzymol. 244:461-486(1994). 
[1579] 684. UDP-glycosyltransferases signature 

^0 UDP glycosyltransferases (UGT) are a superfamily of enzymes that catalyzes the addition of the glycosyl group from 
a UTP-sugar to a small hydrophobic molecule. This family currently consist of: - Mammalian UDP-glucoronosyl trans- 
ferases (UDPGT) [1,2]. A large family of membrane-bound microsomal enzymes which catalyze the transfer of glu- 
curonic acid to a wide variety of exogenous and endogenous lipophilic substrates. These enzymes are of major im- 
portance in the detoxification and subsequent elimination of xenobiotics such as drugs and carcinogens. - A large 

45 number of putative UDPGT from Caenorhabditis elegans. - Mammalian 2-hydroxyacylsphingosine 1 -beta-galactosyl- 
transferase [3] (also known as UDP-galactose-ceramide galactosyltransferase). This enzyme catalyzes the transfer 
of galactose to ceramide, a key enzymatic step in the biosynthesis of galactocerebrosides, which are abundant sphin- 
golipids of the myelin membrane of the central nen/ous system and peripheral nervous system. - Plants flavonol 0(3)- 
glucosyltransferase. An enzyme [4] that catalyzes the transfer of glucose from UDP-glucose to a flavanol. This reaction 

so is essential and one of the last steps in anthocyanin pigment biosynthesis. - Baculoviruses ecdysteroid UDP-glucosyl- 
transferase (EC 2.4.1.-) [5] (egt). This enzyme catalyzes the transfer of glucose from UDP-glucose to ectysteroids 
which are insect molting hornnones. The expression of egt in the insect host interferes with the normal insect develop- 
ment by blocking the nrralting process. - Prokaryotic zeaxanthin glucosyl transferase (gene crtX), an enzyme involved 
in carotenoid bbsynthesis and that catalyses the glycosyiation reaction which converts zeaxanthin to zeaxanthin-beta- 

55 diglucoside.-Streptomyces macrolide glycosyltransferases [6]. These enzymes specifically inactivates macrolide ani- 
tibiotics via 2'-0-glycosylation using UDP-glucose. These enzymes share a consented domain of about 50 amino acid 
residues locatedin their C-terminal section and from which a pattem has been extracted todetect them. 
Consensus pattern: [FW]-x(2)-Q-x(2)-{UVMYA]-[UMV]-x(4,6).[LVGAC]- [LVFYA]-(LIVMF]-[STAGCM].[HNQ]. 
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tSTAGCl-G-x(2HSTAG]-x(3)-[STAGL]-ILIVMFA]-x(4)-(PQR]-(UVMT]-x(3)-(PA)-x(3)-[DESHQEHNl 

[ IJDuttonG.J. (In)Glucoronidation of dfugsandother compounds. DuttonGJ., Ed., pp 1-78 CRC Press Boca Raton 

S'r^ TV' r' ''TrV''^- ° " Lancefo. mLS?? 

A M P'- DNACell Btol. 10:487-494(1991)1 3] Schulte S.. Stoffel W Proc Natl 

:w/^i^ii^)S;s^^^^^ 

Mc^f IS^'.',.^^ V^'^^®^^ ' Hernandez C. Olano C. Mendez C. Salas J.A. Gene 134:139-140(1993) 
[1580] 685. UDP-glucose/GDP-nfiannose dehydrogenase family 

S\„ISfK,t''^rH^'l°^!i°°''?^"°'^ dehydrogenaseses are a small group of enzymes which possesses the 
ab.lrty to cat^ze the NAD^Jependent 2-fold oxidation of an alcholol to an acid without the release of an aldehyde 
intermediate [2]. Number of members: 55 ' 
J15M] (1) Purification and characterization of guanosine diphospho-D-mannose dehydrogenase. A key enzyme in 
rh l'TI ° = Pseudomonas aeruginosa. Roychoudhury S. May TB, Gill JR Singh SK. Feingold DS 

ChakrabartyAM;JBK>IChem19a9;264:9380-9385.[2] Properties and kinetic ana^rsis of UDP-glucosedehydrS^^^ 
SZchetrS-TzS^^ ^-P''^" van de Rijn I^^Tanner 

[1583] 686. Uracil-DNA glycosylase signature 

Uracil-DNA gh^cosylase (EC 3.2.2.-) (UNG) [1] Is a DNA repair enzyme that excises uracil residues from DNA by 
cleaving the N-glycosylic bond. Uracil in DNA can arise as a result of misincorportatran of dUMP residues by DNA 
polymerase or deamination of cytosine. The sequence of uracil-DNA glycosylase is extremely well conserved [2] in 
bacteria and eukaryotes as well as in herpes viruses. More distantly related uracil-DNA glycosylases are also found 
in poxviruses [3].ln eukaryotic cells. UNG activity is found in both the nucleus and the mitochondria Human UNG1 
protein is transported to both the mitochondria and the nucleus [4]. The N-terminal 77 amino acids of UNG1 seem to 
be required for mitochondrial localization [4], but the presence of a mitochondrial transitpeptide has not been directly 
demonstra ed. As a signature for this type of enzyme, the most N-termina conserved region has been selected This 
region contains an aspartic acid residue which has been proposed, based on X-ray stmctures [5.6] to act as a general 
base in the catalytic mechanism. s^iioiai 
Consensus pattern: [KR]-[LIV]-[LIVC]-[LIVMJ-x-G-[QI]-D-P-Y [D is the active site residue]- 

IV uT i n?"cLo°^^i ^^"^ 57:29-67(1 98B).[ 2] Olsen L.C.. Aasland R., Wittwer C.U.. Krokan 

A on X «"..S^.ooaw° if 3^21-3125 (1989).[ 3] Upton C. Stuart D.T. McFadden G. Proc. Natl. Acad. Sd. uT 
H f ^f it^fn i r -^-^l^'To °- ^^'^'^ '^arsaether N.. Bakke O.. Krokan 

?7l4y7SM9L r «?M^^^ T 2;^2579-258^(1993) f Sawa R.. McAuley-Hecht K.. Brown T. Pearl L Nature 
?S^f M iS I r H ■ Tl^-^ - ^'"PP^^"9 Kavli B.. Alseth I.. Krohan H.E.. Tainer J.A. Cell 80:869-R7» 
m2§lJ71 MuiierS.J., Caradonna S. Biochim. Biophys. Acta 1088: 197-207(1 991 ).[ 8] Meyer-Siegler K MauroDJ 
Seal G Wurzer J.. Deriel J.K.. Sirover M.A. Proc. Natl. Acad. Sci. U.S.A. 88:8460-8464(1 991 ).[ 9] Muller S J Cara- 
donnaS.J.Biol.Chem.268:1310-1319(1993^^ 

[1584] 687. Uncharacterized protein family UPF0001 signature 

The following uncharacterized proteins have been shown [1] to share regions ofsimilarities: 

- Yeast chromosome II hypothetical protein YBL036c. - Caenorhabditis elegans hypothetical protein F09E5 8 - 
Baci^^us subWis hypothetical protein yImE. - Escherichia coli hypothetical protein yggS and HI0090. the correspond- 
ing Haemophilus influenzae protein. - Helicobacter pylori hypothetical protein HP0395. - Mycobacterium tubercu- 
losis hypothetical protein MtCY270,20. - Synechocystis strain PCC 6803 hypothetical protein slr0556 - A Pseu- 
domonas aeruginosa hypothetical protein in pilT 5'region. - A Vibrio alginolyticus hypothetical protein in pilT 5're- 
gion. These are proteins of from 25 to 30 Kd which contain a number of conserved regfons. The best consenred 
region which is located in the first third of these proteins has been selected as a signature partem. 

Consensus partem: [FW]-H-[FMJ-[IV]-G-x-[LIV]-Q-x-(NKR]-K-x(3)-[LIV] 
1 1] Bairoch A. Rudd K.E. Unpublished obsenrations (1996). 
[1585] 688. Uncharacterized protein family UPF0003 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - Escherichia coli protein 
aefA. - Escherichia coli hypothetical protein yggB. - Escherichia coli hypothetical protein yjeP and HI0195 1 the cor- 
u"?''^'"^ influenzae protein. - Escherichia coli hypothetical protein ynal. - Bacillus subtilis hyfiothetical 
^^t^n^"^- ■ "^"""^'^'^^ Pyo" hypothetical protein HP0415. - Synechocystis strain PCC 6803 hypothetical protein 
mS '£1 .'"'9''*- ^VP^^'^'-' P-t-n AF1546. - Methanococcus jannaschii hy^tlS p o e J 

MJ0170. - Methanococcus jannaschii hypothetical protein MJ1143.The size of these proteins range from 30 to 120 Kd 
They all contain a number of transmembrane regfons. The best consen/ed regton which is located in and just after the 
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last potential transmembrane region has been selected as a signature pattern. 

Consensus pattern: G-[STIF]-V-x(2)-[LIVM]-x(6HLIVMF]-x(3HDQ]-x(3)4LIVl- x-(LIV]-P-N-x(2)4LIVMFHLIVFSTA]-x 
(5)-N 

( 1] Bairoch A. Unpublished observations (1997). 

[1586] 689. Uncharacterized protein family UPF0004 signature 

The follovwing uncharacterized proteins have been shown [1] to share regions of similarities: - Escherichia coli hypo- 
thetical protein yllG. - Escherichia coli hypothetical protein yleA and HI001 9. the corresponding Haemophilus influenzae 
protein. - Bacillus subtilis hypothetical protein yqeV. - Helicobacter pylori hypothetical protein HP0269 - Helicobacter 
pylori hypothetical protein HP0285. - Mycoplasma iowae hypothetical protein in 16S RNA 5'region - (Wlycobacterium 
leprae hypothetical protein B2235_C2_195. - Pseudomonas aeruginosa hypothetical protein in hemL 3'region - Syn- 
echocystis strain PCC 6803 hypothetical protein slrO082. - Synechocystis strain PCC 6803 hypothetical protein 
SI10996. - Methanococcus jannaschil hypothetical protein MJ0865. - Methanococcus jannaschii hypothetical protein 
MJ0867. - Caenorhabditis elegans hypothetical protein F25B5.5.The size of these proteins range from 47 to 61 Kd 
They contain six conserved cysteines, three of which are clustered in a region that can be used as asignature pattern 
Consensus pattern: [LIVM]-x-[LIVMT]-x(2)-G-C-x(3)-C-[STAN]-[FY]-C-x-[LIVM]-x(4)-G 
[1] Bairoch A. Unpublished observations (1997). 
[1587] 690. Uncharacterized protein family UPF0005 signature 

The following proteins seems to be evolutionary related [1]: - Mammalian protein TEGT (Testis Enhanced Gene Tran- 
script). - Escherichia coli hypothetical protein yccA and HI0044. the corresponding Haemophilus influenzae protein - 
A probable Pseudomonas aeruginosa ortholog of yccA. These are proteins of about 25 Kd which seem to contain 
seven transmembranedomains. A signature pattern that corresponds to a region that starts with the beginning of the 
third transmembrane domain and ends in the middle of the fourth one has been developed 

Consensus pattern: G-{LIVM](2)-[SA]-x(5,8)-G-x(2)-[LIVMl-G-P-x-L-x(4)-[SAG]-x(4,6)-[LIVM](2)-x(2)-A-x(3)-T-A- 
(LIVM](2)-F 

[1] Walter L., Marynen P. Szpirer J., Levan G., Guenther E. Genomics 28:301-304(1995). 
[1588] 691. Uncharacterized protein family UPF0006 signatures 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - Yeast chromosome II 
hypothetical protein YBLOSSc. - Escherichia coll hypothetical protein ycf H and HI0454. the corresponding Haemophilus 
influenzae protein. - Escherichia coli hypothetical protein yigW. - Escherichia coli hypothetical protein yjjVand HI0081 
the corresponding Haemophilus influenzae protein. - Bacillus subtilis hypothetical protein yabD. - Haemophilus influ- 
enzae hypothetical protein HI1664. - Mycoplasma genitalium hypothetical protein MG009. These are proteins of from 
24 to 47 Kd which contain a number of conserved regions. They can be picked up in the database by the foliowina 
patterns. " 

Consensus pattern: (LlVMFY](2)-D-[STA]-H-x-H-[LIVMF]-[DN 
Consensus pattern: P-[LlVM]-x-[LIVM]-H-x-R-x-[TA]-x-[DE 

Consensus pattern: [LVSA]-[LIVA]-x(2)-[LIVM]-tPS]-x(3)-L-[LlVM]-[LlVMS]-E-T- D-x-P 
[ 1] Bairoch A., Rudd K.E. Unpublished observations (1995). 
[1589] 692. Uncharacterized protein family UPF0007 signature 

The following proteins seems to be evolutionary related [1]: - Escherichia coli hypothetical protein ygbP and HI0672 
the corresponding Haemophilus influenzae protein. - Bacillus subtilis hypothetical protein yacM. - Mycobacterium tu- 
berculosis hypothetical protein MtCY06G1 1.29c. - Synechocystis strain PCC 6803 hypothetical protein slr0951 - A 
Rhodobacter capsulatus hypothetical protein in nifR3 5'region. Except for the Rhodobacter protein which contains a 
C-terminal extension, all these proteins have from 225 to 236 amino acids. They are hydrophilic proteins that can be 
picked up in the datatiase by the following pattern. 
Consensus pattern: V-L-[IV]-H-D-[GA]-A-R 
[ 1] Bairoch A. Unpublished observattons (1997). 
[1590] 693. Uncharacterized protein family UPF001 5 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - Yeast chromosome II 
hypothetical protein YBR002c. - Yeast chromosome XIII hypothetical protein YMRIOIc. - Escherichia coli hypothetical 
protein yaeU and HI0920, the correspcmding Haemophilus influenzae protein. - Helicobacter pylori hypothetical protein 
HP1?21. - Mycobacterium leprae hypothetical protein B1937_F2_65. - A Corynebacterium glutamicum hypothetical 
protein in aroF 3'region. - A Streptomyces fradiae hypothetical protein in transposon Tn4556. - Synechocystis strain 
PCC 6803 hypothetical protein sll0505. - Methanococcus jannaschii hypothetical protein MJ1372.These are proteins 
of about 26 to 40 Kd whose central region is well conserved. They can be prcked up in the database by the foliowina 
pattern. ^ 

Consensus pattern: [DE]-[LIVMF](3)-R-T-(SG)-G-x(2)-R-x-S-x-[FY]-[LI\/M](2)-W-Q- 

[ 1] Wolfe K.H.. Lohan A.J.E. Yeast 10:S41 -846(1994). 

[1 591] 694. Uncharacterized protein family UPF001 6 signature 
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The following uncharacterized proteins have been shown [1 ] to share regions of similarities: - Yeast hypothetical protein 
YBR187W. - Fission yeast hypothetical protein SpAC17G8.08c. - Mouse protein pFT27. - Synechocystis strain PCC 
6803 hypothetical protein sll0615. These are hydrophobb proteins of 200 to 320 amino acids that seem to contain six 
or seven transmembrane domains. A conserved region which seems, in the eukaryotic proteins of this family, to directly 
follow the second transmembrane domain has been selected as a signature pattem. 
Consensus pattern: E-[LIVM1-G-D-K-T-F-[LIVMF](2)-A- 
[ 1] Bairoch A, Unpublished observations (1996). 
[1592] 695. Uncharacterized protein family UPF0021 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - Yeast chromosome VII 
hypothetical protein YGL211W. - Dictyostelium discoideum protein veg136. - Methanococcus jannaschii hypothetical 
proteins MJ1157 and MJ1478.These are proteins of from 300 to 36o residues. They can be picked up in thedatabase 
by the following pattem which is located in their N-terminalsection. Consensus pattern* C-K-x(2)-F-x(4)-E-x(22 23)-S- 
G-G-K-D V . / 

[ 1] Bairoch A. Unpublished observations (1997). 

[1593] 696. Uncharacterized protein family UPF0023 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - Mouse protein 22A3. - 
Yeast chromosome XII hypothetical protein YLR022c. - Caenorhabditis elegans hypothetical protein W06E11 .4. - Meth- 
anococcus jannaschii hypothetical protein MJ0592,These are hydrophilic proteins of about 30 Kd. They can be picked 
up in the database by the following pattern. 
[1594] Consensus pattem: D-x-D-E-[LIV]-L-x(4)-V-F-x(3)-S-K-G- 
[1595] [1] Bairoch A. Unpublished obsen^ations (1997). 

[1596] 697. Uncharacterized protein family UPF0024 signature. The following uncharacterized proteins have been 
shown [1] to share regions of similarities: - Escherichia coli hypothetical protein ygbO and HI0701 . the corresponding 
Haemophilus influenzae protein. - Helicobacter pylori hypothetical protein HP0926. - Yeast chromosome XV hypothet- 
ical protein YOR243c. - Caenorhabditis elegans hypothetical protein B0024.11. - Methanococcus jannaschii hypothet- 
ical proteins MJ0588 and MJ1364.These are hydrophilic proteins of from 39 to 77 Kd. They can be picked up in the 
database by the following pattem. 

[1597] Consensus pattem: G-x-K-D-[KR]-x-A-[LV]-T-x-Q-x-[LIVFI-[SGC]- 

[ 1] Bairoch A. Unpublished observations (1997). 

[1598] 698. Uncharacterized protein family UPF0025 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - Escherichia coli hypo- 
thetical protein yfcE. - Bacillus subtilis hypothetical protein ysnB. - Mycoplasma genitalium and pneumoniae hypothet- 
ical protein MG207. - Methanococcus jannaschii hypothetical proteins MJ0623 and MJ0936. These are hydrophilic 
proteins of about 20 Kd. They can be picked up in thedatabase by the following pattern. 
Consensus pattern: D-V-[LlV]-x(2)-G-H-IST]-H-x(12)-[LlVMF]-N-P-G 
[ 1] Bairoch A. Unpublished observations (1997). 
[1599] 699. Uncharacterized protein family UPF0029 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - Yeast chromosome ill 
hypothetical protein YCR59c. - Yeast chromosome IV hypothetical protein YDL177C. - Escherichia coli hypothetical 
protein yigZ and H10722. the corresponding Haemophilus influenzae protein. - Bacillus subtilis hypothetical protein 
yvyE. - A Themnus aquaticus hypothetical protein in pol 5'region. These proteins can be picked up in the database by 
the following pattern. 

Consensus pattem: G-x(2)-[LIVM](2)-x(2).[LIVM]-x(4)-[LIVM]-x(5)-[LIVM](2)-x. R-[FYW](2)-G-G-x(2)-[UVM]-G 
[ 1] Koonin E.V, Bork R, Sander C. EMBO J. 13:493-503(1994). 
[1600] 700. Uncharacterized protein family UPF0030 signature 

The following uncharacterized proteins have been shown [1] to be highly similar: - Yeast chrc^c^ome VI hypothetical 
protein YFL060c, - Yeast chromosome XII I hypothetical protein YMR095c. - Yeast chromosome XI V hypothetical protein 
YNL334C, - Bacillus subtilis hypothetical protein yaaE. - Haemophilus influenzae hypothetical protein H1 1648. - Meth- 
anococcus jannaschii hypothetical protein MJ1661 .These are hydrophilic proteins of about 19 to 25 Kd They can be 
picked up inthe database by the following pattem. 
Consensus pattem: [GA]-L-I-[LIVI-P-G-G-E-S-T-[STA] 
[ 1] Bairoch A. Unpublished observatbns (1997). 
[1601] 701 . Uncharacterized protein family UPF0032 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - Escherichia coli hypo- 
thetical protein yigU and H10188, the corresponding Haerrophilus influenzae protein, - Bacillus subtilis hypothetical 
protein ycbT - Mycobacterium tuberculceis hypothetical protein MtCY49.33c and U2126A, the corresponding Myco- 
bacterium leprae protein. - Synechocystis strain PCC 6803 hypothetical protein sll0194. - CWontella sinensis and Por- 
phyra purpurea chlroplast hypothetical protein ycf43.These proteins have from 245 to 317 amino acids and seem to 
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[1602] 702. Uncharacterized protein family UPF0034 signature 

■me following uncharacterized proteins have been shown f1] to share regions of similarities: - Escherichia coli hypo- 
thetK^al prote.n yhdG and HI0979, the corresponding Haemophilus influenzae protein. - Escherichia coli hySh JSl 
S'll^flSo ^'"^ ^-emo^-^^-s influenzae protein. Escherichia coli hyp^SSprS 

yohl and HI0270. the corresponding Haemophilus influenzae protein. - Bacillus subtilis hypothetical protein yacF - 
Rhodobacter capsulatus protein nifR3 and related proteins in Azospirillum brasilense and Rhizobium leguminoLrum 
r^r"". ^yP^^-'^'^^ P^ote'n «'r0644. - Synechocystis strain PCC 6803 hypothefical^^teT, 
YLfSi; '^rt.^ ;'^9^« hypothetical protein C45G9.2. - Yeast protein SMM1. - Yeast hypothetical protein 

121 ^ Rho^LtrJ 1^ P«>te,n YLR405W, - Yeast hypothetical protein YMLOSOw Although it haVbeen proposed 
i ll * <^PS"'atus nrfRS ,s a transcriptional regulatory protein, it is believed that these proteins constitute 

a family of enzymes whose active site could include a consented cysteine which has been used as the central ^rt of 
a siQnaiuro pattern. 

Consensus pattern: [LIVM]-[DNG]-[LIVM]-N-x-G-C-P-x(3)-[LIVMASQ].x(5)-G-[SAC] 

p i^Tl'/ ' ^"f^K^E^ Unpublished obsen^ations (1995).[2I Foster-Hartnett D.. Cullen P.J.. Gabbert K.K.. Kranz 
R.G. Mol. Microbiol. 8:903-914(1993). -rv.. man^ 

[1603] 703. Uncharacterized protein family UPF0038 signature 

The following uncharacterized proteins have been shown [1] to share regions of similarities: - Escherichia coli hypo- 
thetical protein yacE and HI0890. the corresponding Haemophilus influenzae protein. - N4ycobacterium tubeLlcSs 
hypothet^a, protein MtCYOI B2.23 and 041 0. the corresponding Mycobacterium leprae protein. - SynechJ^^stis sS n 
PCC 6803 hypothetical protein slK3553. - Other hypothetical proteins from Aeromonas hydrophila BacteroWes nS. 
sus. Neisseria gonorrhoeae, Pseudomonas putida, Themius thermophilus and Xanthomonas campestris - Human 
SsTlhesf :::,r"'ii" - ^-^^-'^^'tis elegans hypotheL pZJ 

If^rn^trVT ^ ^-terminal extremity, an ATP/GTP-binding motif W (P-loop) (see 

^Oe^m The s.ze of these proteins range from 200 to 290 residues (with the exception of the Mycobacterial 
sequences which are are 41 0 residues long). A conseved region some 50 residues away from the ATP-binding P-loop 
has been developed as a signature pattern. ^ 

varnTnlg'r""^ G-x-[LI]-x-R-x(2)-L-x(4)-F-x(8)-[LIV]-x(5)-P-x-[LIV]-( 1J RuddK.E.. Bairoch A. Unpublished obser- 
[1604] 704. Ubiquitln-conjugating enzymes active site 

orS!"r"'"f r^.T'^"''' °' ^""^"^^"^ ''■^•^1 attachment of ubiquitin to target 

proteins. An ac ivatedubiquitin moiety is transferred from an ubiquitin-activating enzyme (El) to E2which later liqatVs 

soS^tH '° "T"'' " ^''''"^'^ °' ^^ognizing proteins (E3 . n mS 

species there are many fomis of UBC (at least 9 in yeast) which are implicated in diverse cellular functions A cysteine 

Znd th' "'"^"^'"■♦^'^•^^^-^ There is a single conserved cysteine in UBcS and the'igion 
araund that residue isconserved in the sequence of known UBC isozymes. That region has been used as a signaZ 

??r'^"'"'Pf^'"^^ [LIV] (C is the actK^e site residue] 

W iJ^^^H p R . ^J, 15:195-198(1990).[2] Jentsch S.. Seufert 
In^M ^"^^r "^"^ 1089^127-139(1991 ).[ 3] HershkoA. Trends Biochem. Sci. 16:265-268(1991) 

[1605] 705. Uroporphynnogen decarboxylase signatures 

decarboxyla lon of the four acetyl side chains of uroporphyrinogen to yield coproporphyrinogen [1 ] URO-D deficSnS 
IS respaisib le for the Human genetic d^eases familialporphyre cutanea tarc^ (fPCT) and hepLieXpoS^r^ 
M f '^^T' °' ^"^^"'^ ♦'^^°"9hout evolution. The best conserved re^o^^L 

^SSS^ii/whthTfH ^ P«rt^'^»^~"««'ved hexapeptide. There are two arginine residues in this 

wJJ Ln<f 1 Tk ^^t? ^ ='9"^»"^« P^««^" l^s^d on a another 

well conserved region which is kwated in the central section of the protein 

Consensus partem: P-x-W-x-M-R-Q-A-G-R 

Consensus partem: G-F-[STAGCV]-[STAGC]-x-P-[FYW]-T-ILV]-x(2)-Y-x(2HAE]-[GK] 

5^1^^6(1992^''^"^'' "^"^ "^^"^ '^"^''"^^ "J ■ L^bb^ P- J. Biochem. 205: 

[1606] 706. ubiE/COQ5 methyltransferase family signatures 

The following methyltransferases have been shown [1 J to share regions of similarities: - Escherichia coli ubiE. which 
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is involved In both ubiquinone and menaquinone biosynthesis and which catalyzes the S-adenosylmethionine depend- 
ent methylation of 2-polyprenyl-6-methoxy-1.4-ben2oquinol into 2-polyprenyl-3- methyl-6-methoxy-1.4-benzoquinol 
and of demethylmenaqulnol into menaquinol. - Yeast COCas. a ubiquinone bfosynthesis methlytransferase - Bacillus 
subtilis spore germination protein C2 (gene: gercB or gerC2). a probable menaquinone biosynthesis methlytransferase 
- Lactococcus lactis gerC2 homolog. - Caenorhabditis elegans hypothetical protein ZK652.9. - Leishmania donovani 
amastigote-specific protein Ml. These are hydrophilic proteins of about 30 Kd (except for ZK652 9 which is 65Kd) 
They can be picked up in the database by the following patterns. 
Consensus pattern: Y-D-x-M-N-x(2)-[LIVM]-S-x(3)-H-x(2)-W 
Consensus pattern: R-V-[LIViVI]-K-[PV]-G-G-x-[LIVMFJ-x(2)-[LIVM]-E-x-S 
[ 1] Lee RT, Hsu A.Y, Ha H.T, Clarke C.F. J. Bacteriol. 179:1748-1754(1997). 
[1607] 707. Uricase signature 

Uricase (urate oxidase) [1] is the peroxisomal enzyme responsible for the degradation of urate into allantoin Some 
species, like prinriates and birds, have lost the gene for uricase and are therefore unable to degradeurate Uricase is 
a protein of 300 to 400 amino acids. A highly conserved region kjcated in the central part of the sequence has been 
used as a signature pattern. 

Consensus partem: |LV]-x-[LV]-[LIV]-K-[STV]-[ST]-x-(SN]-x-F-x(2)-[FY]-x(4)- (FY]-x(2)-L-x(5)-R 
[ 1] Motojima K., Kanaya S., Goto S. J. Biol. Chem. 263:16677-16681(1988). 
[1608] 708. Universal stress protein family (Usp) 

[1 609] By a wide range of stress conditions members of the Usp family are predicted to be related to the MADS-box 
proteins transcript fact and bind to DNA [2]. Number of members: 39 

[1] Expresston and role of the universal stress protein. UspA. of Escherichia coli during growth arrest Nystrom T 
Neidhardt FC; Mol Microbiol 1994; 11:537-544. 

[2] Sequence analysis of eukaryotic developmental proteins: ancient and novel domains. Mushegian AR Koonin 
EV; Genetics 1996; 144:817-828. 

[1610] 709. Ubiquitin domain signature and profile 

Ubkiuitin [1.2,3] is a protein of seventy six amino acid residues, found in all eukaryotic cells and whose sequence is 
extremely well conserved from protozoan to vertebrates. It plays a key role in a variety of cellular processes such as 
ATP-dependent selective degradation of cellular proteins,maintenance of chromatin stmcture, regulatton of gene ex- 
pression, stress response and ribosome biogenesis. In most species, there are many genes coding for ubiquitin How- 
ever they can be classified into two classes. The first class produces polyubiquitin molecules consisting of exact head 
to tail repeats of ubiquitin. The number of repeats is variable (up to twelve in a Xenopus gene). In the majority of 
polyubiquitin precursors, there is a final amino-acid after the last repeat. The second class of genes produces precursor 
proteins consisting of a single copy of ubiquitin fused to a C4erminal extension protein (CEP). There are two types of 
CEP proteins and both seem to be ribosomal proteins. Ubiquitin is a globular protein, the last four C-terminal reskJues 
(Leu-Arg- Gly-Gly) extending from the compact structure to form a tail', important for its function. The latter is mediated 
by the covalent conjugation of ubiquitin to target proteins, by an isopeptide linkage between the C-terminal glycine and 
the epsilon ammo group of lysine residues in the target proteins. There are a number of proteins which are evolutionary 
/Q™°ru " Pf°<»'ns from baculoviruses as well as in some strains of bovine viral diarrhea viruses 

(BVDV). These proteins are highly similar to their eukaryotic counterparts. - ivlammalian protein GDX [4] GDX is com- 
posed of two domains, a N-tenninal ubiquitin-like domain of 74 residues and a C-terminal domain of 83 residues with 
some similarity with the thyroglobulin hormonogenic site. - (Wlammalian protein FAU [5]. FAU is a fusion protein which 
consist of a N-terminal ubiquitin-like protein of 74 residues fused to ribosomal protein S30. - Mouse protein NEDD-8 
[6] a ubiqurtin-like protein of 81 residues. - Human protein BAT3. a large fusion protein of 11 32 residues that contains 
a N-temiinal ubiquitin-like domain. - Caenortiabdite elegans protein ubl-1 [7]. Ubl-1 is a fusion protein which consist 
of a N-terminal ubiquitin-like protein of 70 rescues fused to ribosomal protein S27A. - Yeast DNA repair protein RAD23 
^^1 ril^^^ ^ N-terminal donr>ain that seems to be distantly, yet significantly, related to ubiquitin. - Mammalian 

RAD23-related proteins RAD23A and RAD23B. - Mammalian BCL-2 binding athanogene-1 (BAG-1 ). BAG-1 is a protein 
Of 274 residues that contains a central ubquitin-like domain. - Human spliceosome associated protein 114 (SAP 114 
or SF3A1 20). - Yeast protein DSK2. a protein involved in spindle pole body duplication and which contains a N-terminal 
ubiquitin-like domain. - Human protein CKAP1/TFCB. Schizosaccharomyces pombe protein alpll and Caenorhabditis 
elegans hypothetical protein F53F4.3. These proteins contain a N-terminal ubiquitin domain and a C-tenninal CAP- 
Gfy domain. - Schizosaccharomyces pombe hypothetical protein SpAC26A3.16. This protein contains a N-terminal 
ubiquitin domain. - Yeast protein SMT3. - Human ubiquitin-like proteins SMT3A and SMT3B. - Human ubiquitin-like 
protein SMT3C (also known as PICI; Ubil; Sumo-1; Gmp-1 or Sentrin). This protein is involved in targeting ranGAPl 
to the nuclear pore complex protein ranBP2. - SMT3-like proteins in plants and Caenoitiabditis elegans To identify 
ubiquitin and related proteins, a partem has been devetoped based on consen/ed positfons in the central section of 
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the sequence. A profile was also developed that spans the connplete length of the ubiquitin domain 
Consensus pattern: K-x(2)-ILIVI^-x.[DESAK]-x(3)-[LIVM]-[PA]-x(3)<>x-[LIVM14LIVMCHU^ 
[ 1] Jentsch S.. Seufert W., Hauser H.-P. Biochim. Biophys. Acta 1 089:127-1 39(1 991 ).[ 2J Monia B P Ecker D J Croke 
S.T BioO^Bchnology 8:209-21 5(1 990).[ 3] Finley D.. V^rshavsky A. Trends Biochem. Sci. 10:343-347(1985) [ 4] Filippi 
M.. Tnbtoli C. Toniolo D. Genomics 7:453-457(1 990).[ 5] Olvera J„ Wool I.G. J, Bbl. Ghem. 268:17967-17974(1993) 
[ 6] Kumar S.. Yoshlda Y., Noda M. Biochem. Biophys. Res. Commun. 1 95:393-399(1 993),[ 7] Jones D Candido E 
P. J. Biol. Chem. 268:1 9545-1 9551 (1993). [8] MelnickL. Sherman F. J. Mol. Biol. 233' 372-388 (1993) 
[1611] 710. VHS domain 

[1612] Domain present in VPS-27, Hrs and STAM. Number of members: 27 
[1613] 711 . Vinculin family signatures 

Vinculin [1 ] is a eukaryotic protein that seems to be involved in the attachment of the actin^ased microfilaments to the 
plasma membrane. Vinculinis located at the cytoplasmic side of focal contacts or adhesion plaques. In addition to actin 
vinculin Interacts with other structural proteins such as talin and alpha^actinins. Vinculin is a large protein of 116 Kd 
(about a 1000 residues). Structurally the protein consists of an acidic N-terminal domain of about 90 Kd separated 
from a basic C-terminal domain of about 25 Kd by a proline-rich region of about 50 residues. The central part of the 
N-terminal domain consists of avariable number (3 in vertebrates, 2 In Caenorhabditis elegans) of repeats of a 110 
ammo acids domain. Catenins [2] are proteins that associate with the cytoplasmic domain of avariety of cadherins 
The association of catenins to cadherins produces a complex which Is linked to the actin filament network, and which 
seems to be of primary importance for cadherins cell^adhesion properties. Three different types of catenins seem to 
exist: alpha, beta, and gamma. Alpha^tenins are proteins of about 100 Kd which are evolutionary related to vinculin 
Intemi of their structure the most significant differences are the absence, inalpha-catenin. of the repeated domain and 
of the proline-rich segment. Two signature pattems for this family of proteins have been devolped. The first pattern is 
located in the N-terminal section of both vinculin and alpha-catenins and Is part, in vinculin, of a domain that seems to 
be involved with the interaction with talin. The second pattern is based on a conserved regionin the N-terminal part of 
the repeated domain of vinculin. 

Consensus pattern: [KR].x.[LIVMF]-x(3)-[LIVMA]-x(2)-[LIVM]-x(6)-R-Q-Q.E-L Consensus pattern: [LIVf^l-x-fQAl-A-x 
(2)-W-[IL]-x-[DN]-P ^ J I J" A 

[ 1]Otto J.J. Cell Motil. Cytoskeleton 1 6: 1-6(1 990). [ 2] Herrenknecht K.. OzawaM.. Eckerskom C. Lottspeich F Lenter 
M.. Kemler R. Proc, Natl. Acad. Sci. U.S.A. 88:9156-9160(1991). 
[1614] 712. (Vitellogenin N) Lipoprotein amino terminal region 

[1615] This family contains regions from: Vitellogenin. Microsomal triglyceride transfer protein and apolipoprotein B- 
100. These proteins are all involved in lipid transport [I]. This family contains the LVIn chain from llpovltellin that 
contains two structural domains. Number of members: 33 

[1616] [1] The structural basis of lipid interactions in lipovltellln. a soluble lipoprotein. Anderson TA Levitt DG Ba- 
naszak LJ Structure 1998;6:895-909. 

[1617] 713. (VMSA) Major surface antigen from hepadnavirus 
[1618] 714. ssDNA binding protein (Viral DNA bp) 
This protein is found in herpesviruses and Is needed for replication, 
[1619] 715. (Votage CLC) Voltage gated chloride channels 

[1620] This family of ion channels contains 10 or12 transmembrane helices. Each protein forms a single pore It 
has been shown that some members of this family fomri homodimers. These proteins contain two CBS domains. 

[1] Schmidt-Rose T, Jentsch TJ; J Biol Chem 1997;272:20515-20521. 

[2] Zhang J. George AL Jr, Griggs RC, Fouad GT, Roberts J. Kwiecinski H, Connolly AM. Ptacek Neurotoov 
1996:47:993-998. ' 

[1621] 71 6. von Willebrand factor type A domain (vwa) 

More von Willebrand factor type A domains? Sequence similarities with malaria thrombospondin -related anonymous 

protein, dihydropyridine-sensitive calcium channel and Inter-alpha-trypsin inhibitor 

Bork P. Rohde K; 

Biochem J 1 991 ;279:908-91 1 . 

1. RUGGERI, Z.M. and WARE, J. 
von Willebrand factor. 

FASEB J. 7 308-316(1993). 

2. COLOMBATTI. A.. BONALEX). R and DOUANA. R. 

Type A modules: interacting domains found in several non-fibrillar collagens and in other extracellular matrix pro- 
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teins. 

MATRIX 13 297-306 (1993). 

3. PERKINS. S.J.. SMITH. K.R, WILLIAMS. S.C.. HARIS, Rl.. CHAPMAN. D. and SIM. R.B. 

The secondary structure of the von Willebrand factor type A domain in factor B of human complement by Fourier 
transfomi infrared spectroscopy. 

Its occurrence In collagen types VI, VII. XII and XIV the integrlns and other proteins by averaged structure pre- 
dictions. 

J.MOLBIOL 238 104-119 (1994). 

4. BORK, P and ROHDE, K. 

More von Willebrand factor type A domains? Sequence similarities with malaria thrombospondin-related anony- 
mous protein, dihydropyrldine-sensitive calcium channel and inter-alpha-trypsin inhibitor. 
BIOCHEM.J. 279 908-910 (1991). 

5. EDWARDS. Y.J.K. and PERKINS. S.J. 

The protein fold of the von Willebrand factor type A domain is predicted to be similar to the open twisted beta- 
sheet flanked by alpha-helices found in human ras-p21 . 
FEBS LETT. 358 283-286 (1995). 

6. LEE, J.O., RIEU, P. ARNAOUT, M.A. and LIDDINGTON, R. 

Crystal structure of the A domain from the alpha subunit of integrin CR3 (CD11b/CD18) 
CELL80 631-638 (1995). 

7. QU, A. and LEAHY. DJ. 

Crystal structure of the l-domain from the CD11a/CD18 (LFA-1, alpha L beta 2) integrin 
PROC.NATLACAD.SCI.USA92 10277-10281 (1995). 

[1622] The von Willebrand factor is a large multimeric glycoprotein found in blood plasma. Mutant forms are involved 
in the aetiology of bleeding disorders [1 ]. In von Willebrand factor, the type A domain (vWF) is the prototype for a protein 
superfamily. The vWF domain is found in various plasma proteins: complement factors B, C2 CR3 and CR4- the 
integrins (domains); collagen types VI. VII, XII and XIV; and other extracellular proteins [2-4]. Proteins that incorporate 
vWF domains participate in numerous biological events (e.g., cell adhesion, migration, homing, pattern formation and 
signal transduction), involving interaction with a large array of ligands [2]. Secondary structure prediction from 75 
aligned vWF sequences has revealed a largely alternating sequence of alpha-helices and beta-strands [3] Fold rec- 
ognition algorithms were used to score sequence compatibility with a library of known structures; the vWF domain fold 
was predicted to be a doubly-wound, open, twisted beta-sheet flanked by alpha-helices [5]. 3D structures have been 
determined for the l^iomains of integrins CD1 1 b (with bound magnesium) [6] and CD1 1 a (with bound manganese) [7] 
The domain adopts a classic alpha^eta Rossmann fold and contains an unusual metal ion coordination site at its 
surface. It has been suggested that this site represents a general metal ion-dependent adhesion site (MIDAS) for 
binding protein ligands [6J. The residues constituting the MIDAS nrratif in the CD11b and CDIIa l^iomains are com- 
pletely conserved, but the manner in which the metal ion is coordinated differs slightly [7]. 

[1623] VWFADOMAIN is a 3-element fingerprint that provides a signature for the vWF detain superfamily. The 

fingerprint was derived from an initial alignment of 1 4 sequences. Motif 1 includes the first beta-strand and 3 conserved 

residues involved in metal ion coordination in Indomains (Asp and 2 serines in positions 8, 10 and 12 respectively)- 

motif 2 spans strands beta-2 and beta-2'; and motif 3 encodes beta-strand 3 and a conserved Asp (in position 7) which 

coordinates the metal ion [6.7], Three iterations on OWL27.0 were required to reach convergence, at which point a 

true set comprising 56 sequences was identified. Numerous partial matches were also found 

[1624] 717. (WD40) WD domain. G-beta repeat 

The ancient regulatory-protein family of WD-repeat proteins. 

Neer EJ, Schmidt CJ, Nambudripad R, Smith TF; 

Nature 1 994;371 :297-300. 

Beta-transducin (G-beta) is one of the three subunits (alpha, beta, and gamma) of the guanine nucleotide-binding 
proteins (G proteins) which act as intermediaries in the transduction of signals generated by transmembrane receptors 
[1]. The alpha subunit binds to and hydrolyzes GTP; the functions of the beta and gamma subunits are less clear but 
they seem to be required for the replacement of GDP by GTP as well as for membrane anchoring and receptor rec- 
ognition, ^ 

[1625] In higher eukaryotes G-beta exists as a snnall multigene family of highly consented proteins of about 340 
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amino acid residues. Structurally G-beta consists of eight tandem repeats of about 40 residues, each containing a 
central Trp-Asp motif (this type of repeat is sometimes called a WD-40 repeat). Such a repetitive segment has been 
shown [El, 2,3. 4,5] to exist in a number of other proteins listed below: 

- Yeast STE4. a component of the pheromone response pathway STE4 is a G-beta like protein that associates with 
GPA1 (G-alpha) and STE18 (G-gamma), 

- Yeast MSI1 , a negative regulator of RAS-mediated cAMP synthesis. MS1 1 is most probably also a G-beta protein. 

- Human and chicken protein 12.3. The function of this protein is not known, but on the basis of its similarity to G- 
beta proteins, it may also function in signal transduction. 

- Chlamydomonas reinhardtii gblp. This protein is most probably the homolog of vertebrate protein 1 2.3. 
Human LIS1 , a neuronal protein involved in type-1 lissencephaly [E2]. 

- Mammalian coatomer beta' subunit (beta'-COP). a component of a cytosolic protein complex that reversibly as- 
sociates with Golgi membranes to fomn vesicles that mediate biosynthetic protein transport. 

Yeast CDC4. essential for initiation of DNA replication and separation of the spindle pole bodies to form the poles 
of the mitotic spindle. 

Yeast CDC20, a protein required for two microtubule<jependent processes; nuclear movements prior to anaphase 
and chromosome separation. 

Yeast MAK11. essential for cell growth and for the replication of Ml double-stranded RNA. 
Yeast PRP4, a component of the U4/U6 small nuclear ribonucleoprotein with a probable role in mRNA splicing 
Yeast PWP1 , a protein of unknown function. 

Yeast SKIS, a protein essential for controlling the propagation of double-stranded RNA. 
Yeast SOF1 , a protein required for ribosomal RNA processing which associates with U3 small nucleolar RNA 
Yeast TUP1 (also known as AER2 or SFL2 or CYC9), a protein which has been implicated in dTMP uptake, cat- 
abolite repression, mating sterility, and many other phenotypes. 
Yeast YCR57C, an ORF of unknown function from chromosome III. 
Yeast YCR72C, an ORF of unknown function from chromosome III. 

Slime mold coronin, an actin-binding protein. 

Slime mold AAC3. a developmentally regulated protein of unknown function. 

Drosophila protein Groucho (formerly known as E(spl): 'enhancer of split'), a protein involved in neurogenesis and 
that seems to interact with the Notch and Delta proteins. 
Drosophila TAF-H-80, a protein that is tightly associated with TFIID. 

[1626] The number of repeats in the above proteins varies between 5 (PRP4, TUP1 , and Groucho) and 8 (G-beta 
STE4. MS1 1 , AAC3. CDC4. PWP1 , etc.). In G-beta and G-beta like proteins, the repeats span the entire length of the 
sequence, while in other proteins, they make up the N-terminal, the central or the C-terminal section. 
[1 627] A signature pattern can be developed from the central core of the domain (positions 9 to 23). 

- Consensus pattern: {LIVMSTAC]-[LIVMFYWSTAGCHLIMSTAG]-[LIVMSTAGC]-x(2)-[DNJ- 
x(2)-[LIVMWSTAC]-x-[LIVMFSTAG]-W-[DEN]-[LIVMFSTAGCN] 
[IJGilman A.G. 

Annu. Rev. Biochem. 56:615-649(1987). 

[ 2] Duronio RJ.. Gordon J.I.. Boguski M.S. 

Proteins 13:41-56(1992). 

[ 3] van der Voorn L, Ploegh H.L 

FEBS Lett. 307:131 134(1992). 

[ 4] Neer E.J., Schmidt C.J., Nambudripad R.. Smith TF. 

Nature 371 :297-3CK)( 1994). 

[ 5] Smith TF.. Gaiatzes C.G., Saxena K.. Neer E.J. 

Biochemistry In Press(1998). 

[1628] 718. WHEP-TRS domain cc^itaining proteins 

A conserved domain of 46 amino acids has been shown [1 ] to exist in a number of higher eukaryote aminoacyl-transfer 
RNA synthetases. This domain is present one to six times in the following enzymes: 
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- Mammalian muHifunctional aminoacyl-lRNA synthetase. The domain is present three times in a region that sepa- 
rates the N-terminal glutamyl-tRNA synthetase domain from the C-temiinal prolyl-tRNA synthetase dcMnain 

- Drosophila multifunctional aminoacyl-tRNA synthetase. The domain is present six times in the intercatalytic region 

- Mammalian tryptophanyl-tRNA synthetase. The domain is found at the N-terminal extremity. 

- Mammalian, insect, nematode and plant glycyl-tRNA synthetase. The domain is found at the N-terminal extremity 

- Mammalian histidyl-lRNA synthetase. The domain is found at the N-terminal extremity. 

[1629J This domain, which is called WHEP-TRS. could contain a central alpha-helical region and may play a role in 
the association of tRNA-synthetases into multienzyme complexes. 

[1 630] A signature pattern tjased on the first 29 positions of the WHEP-Domain has been developed. 

- Consensus pattern: [QY]-G-(DNEAhx-[LIV]-[KR]-x(2)-K-x(2)-[KRNG]-[AS]-x(4)-[LIV]-[DENKl-x(2)-[IV]-x(2)-L-x 

(3)-K 

[ 1] Cerini C, Kerian P.. Astier M.. Gratecos D., Mirande M.. Semeriva M. EMBO J 10-4267-4277(1991) 
[ 2] Nada S., Chang RK., Dignam J.D. 
J. Biol. Chem. 268:7660-7667(1993). 

[1631] 71 9. (Worm family 8) Putative membrane protein 
Analysis of protein domain families In Caenorhabditis elegans. 
Sonnhammer EL, Durbin R; 
Genomics 1997;46:200-216. 

This family called family 8 in [1], may be a transmembrane protein 
The specific function of this protein Is unknown. 
[1632] 720. Xylose Isomerase 

Xylose Isomerase (EC 5.3.1.5) [1] Is an enzyme found in microorganisms which catalyzes the interconversion of D- 
xylose to D-xylulose. It can also Isomerize D-ribose to D-ribulose and D-glucose to D-f ructose. Xylose isomerase seems 
to require magnesium for Its activity, while cobalt Is necessary to stabilize the tetrameric structure of the enzyme A 
number of residues are conserved in all known xylose isomerases. 

[1633] Xylose isomerase also exists In plants [2] where It Is homodimeric and Is manganese^jependent 
[1634] Two signatures patterns for xylose Isomerase have been developed. The first one is derived from a stretch 
of five conserved amino acids that includes a glutamic acid residue known to be one of the four residues involved in 
the binding of the magnesium ion [3]; this pattern also includes a lysine residue which is involved in the catalytic activity 
The second pattern is derived from a conserved region in the N-temiinal section of the enzyme that include an histidine 
residue which has been shown [4] to be Involved in the catalytic mechanism of the enzyme 
Consensus pattern: [LI]-E-P-K-P-x(2)-P 
[E Is a magnesium ligand] 
[K is an active site residue] 



- Consensus pattern: [FL]-H-D-x-D-[U\/]-x-[PD]-x-[GDE] 
[H is an active site residue] 



[ 1] Dauter Z, Dauter M., Hemker J.. Witzel H.. Wilson K.S. 
FEES Lett. 247:1-8(1989). 

[ 2] Kristo PA., Saarelainen R, Fagerstrom R., Aho S., Korhola M. 
Eur. J. Blochem. 237:240-246(1996). 
[ 3] Henrick K., Collyer C.A.. Blow DM, 
J. Mol. Biol. 208:129-157(1989). 

[ 4] Vangrysperre W„ Ampe C„ Kersters-Hilderson H., Tempst R 
Blochem. J. 263:195-199(1989), 



[1635] 721. XPG protein signatures. Xerodemna pigmentosum (XP) [1] Is a human autosomal recessive disease 
charactenzed by a high incidence of sunlight-induced skin cancer. People's skin cells with this condltbn are hypersen^ 
sitive to ultraviolet light, due to defects in the inciskjn step of DNA excision repair There are a minimum of seven 
genetic complementation groups involved In this pathway: XP-A to XP-G. The defect In XP-G can be corrected by a 
133 Kd nuclear protein called XPG (or XPGC) (2].XPG belongs to a family of proteins [2,3.4.5,6] that are composed 
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of twomain subsets: - Subset 1 . to which belongs XPG. RAD2 from budding yeast and radi3 from fission yeast RAD2 
and XPG are single-stranded DNA endonucleases [7,8]. XPG makes the Slnclsion in human DNA nucleotide excision 
repair [9]. - Subset 2. to which belongs mouse and human FEN-1, rad2 from fission yeast, and RAD27 from budding 
yeast. FEN-1 is a structure-specific endonuclease. In addition to the proteins listed in the above groups this family 
also includes: - Fission yeast exol. a 5'.>3' double-stranded DNA exonuclease that could act in a pathway that corrects 
mismatched base pairs. - Yeast EX01 (DHS1). a protein with probably the same function as exol. - Yeast DIN7.Se- 
quence alignment of this family of proteins reveals that similarities are largely confined to two regions The first is 
located at the N-terminal extremity (N-region) and corresponds to the first 95 to 105 amino acids. The second region 
IS internal (l-region) and found towards the C-terminus; it spans about 140 residues and contains a highly consented 
core of 27 ammo acids that includes a conserved pentapeptide (E-A-[DE].A-[QS]). It Is possible that the consented 
acidic residues are involved in the catalytic mechanism of DNA excision repair in XPG. The amino acids linking the N- 
and l-regions are not conserved; indeed, they are largely absent from proteins belonging to the second subset Two 
signature patterns have been developed for these proteins. The first corresponds to the central part of the N-region 
the second to part of the l-reglon and Includes the putative catalytic core pentapeptide 
[1636] Consensus pattern: [VI]-[KRE]-P-x-[FYIL]-V-F-D-G-x(2)-[PIL]-x-[LVC]-K- 
Consensus pattem: [GS]-[LI VM]-[PERHFYS]-[LI VM]-x-A-P-x-E-A-[DE]-[PAS]- [QSHCLM]- 

[16371 [1] Tanaka K.. Wood R.D. Trends Biochem. Scl. 1 9:83-86(1 994).[ 2] Scherly D., Nousplkel T Corlet J Ucia 
C. Bairoch A., Clarkson S.G. Nature 363: 182-1 85(1 993).[ 3] Carr A.M., Sheldrick K.S.. Murray J M , Al-Harithy R 
Watts FZ. Lehmann A.R. Nucleic Acids Res. 21: 1345-1 349(1 993).[ 4] Murray J.M.. Tavassoli M.. Al-Harithy R SheN 
drick K.S.. Lehmann A.R.. Carr A.M., Watts FZ. Mol. Cell. Biol, 14:4878-4888(1 994).[ 5] Harrington J J Lieber M R 
Genes Dev. 8:1 344-1 355(1 994).[ 6] SzankasI P. Smith G.R. Science 267:1 166-1 169(1 995). [ 7] Habraken Y Sung P 
Prakash L, Prakash S. Nature 366:365-368(1 993). [ 8] O'Donovan A., Scherly D., Clarkson S G Wood R D J Biol' 
Chem. 269:15965-15968(1994).[ 9] O'Donovan A.. Davies A.A.. Moggs J.G.. West S C Wood R D Nature 371* 
432-435(1994). 

[1638] 722. Xanthine/uracil permeases family 

The following transport proteins which are involved in the uptake of xanthine or uracil are evolutionary related [1]: 

- Uric uric acid-xanthlne permease (gene uapA) from Aspergillus nidulans. 

- Purine permease (gene uapC) from Aspergillus nidulans. 
Xanthine permease from Bacillus subtills (gene pbuX). 

- Uracil permease from Escherichia coll (gene uraA) [2] and Bacillus (gene pyrP). 

- Hypothetical protein ycdG from Escherrchia coli. 

- Hypothetical protein ygfO from Escherichia coli. 

- Hypothetical protein ygfU from Escherichia coli. 

- Hypothetical protein yicE from Escherichia coli. 
Hypothetical protein yunJ from Bacillus subtills. 

- Hypothetical protein yunK from Bacillus subtills. 

[1639] They are proteins of from 430 to 595 residues that seem to contain 12 transmembrane domains 

The best conserved region which corresponds with what seems to be the tenth transmembrane domain has been 

selected as a signature pattem. 

- Consensus pattem: [LIVM]-P-x-[PASIF]-V-[LIVM]-G.G-x(4)-[LIVM]-[FY]-[GSA]-x-[LIVM]-x(3)-G 

[ 1] Diallinas G., Gorfinklel L.. Arst G., Cecchetto G., Scazzocchio C. J. Biol. Chem. 270:8610-8622(1995). 
[ 2] Andersen RS., Frees D.. Fast R.. Mygind B. J. Bacteriol. 177:2008-2013(1995). 

[1 640] 723. Hypothetical yabOyyceC/sfhB family 

The following proteins, which seems to betong to a family of pseudouridine synthases (EC 4.2.1 70) [1] have been 
shown to share regions of similarities: 

- Escherichia coli and Haemophilus influenzae ribosomal large subunit pseudourkiine synthase A (gene rluA). It is 
responsible for synthesis of pseudouridine from uracil-746 IN 23S rRNA. 

- Escherichia coli and HaenrK>phllus influenzae ribosomal large subunit pseuctouridine synthase C (gene rluC). It is 
responsible for synthesis of pseudouridine ircm uracil at positions 955, 2504 and 2580 in 23S rRNA. 

- Escherichia coli protein and homobgs in other bacteria large subunit pseudouridine synthase D (gene rluD) 

- Yeast DRAP deaminase (gene RIB2). 

- Escherichia coli hypothetical protein yqcB and HI1435. the corresponding Haemophilus influenzae protein. 
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Haemophilus influenzae hypothetical protein H 10042. 
Aquifex aeollcus hypothetical protein AQ_1758. 
Bacillus subtilis hypothetical protein yhcT. 
Bacillus subtilis hypothetical protein yjbO, 
Bacillus subtilis hypothetical protein ylyB. 
Helicobacter pylori hypothetical protein HP0347. 
Helicobacter pylori hypothetical protein HP0745. 
Helicobacter pylori hypothetical protein HP0956. 
Mycoplasma genitalium hypothetical protein MG209. 
Mycoplasma genitalium hypothetical protein MG370. 
Synechocystis strain PCC 6803 hypothetical protein slr1592. 
Synechocystis strain PCC 6803 hypothetical protein slr1629. 
Yeast hypothetical protein YDL036c. 
Yeast hypothetical protein YGR169c. 
Fissbn yeast hypothetical protein SpAC1 881 1.02c. 
Caenorhabditis elegans hypothetical protein K07E8.7. 

[1641] These are proteins of from 21 to 50 Kd which contain a number of consen/ed regions in their central section. 
They can be picked up in the database by the following highly conserved pattern. 

- Consensus pattern: [LIVCA]-[NHYT]-R-[LI]-D.x(2)-T-[STA]-G.[LIVAGq.[LI\^F](2)-fLIVM 

[1642] [ 1] Conrad J., Sun D., Englund N.. Ofengand J. J. Biol. Chem. 273:18562-18566(1998). 

[1643] In addition, the following bacterial proteins, which seems to belong to a family of pseudouridine synthases 

(EC 4.2.1.70) [1] also have been shown to share regions of similarities: 

- Escherichia coll and Haemophilus influenzae 16S pseudouridylate 51 6 synthase (EC 4.2.1.70) (gene: rsuA). This 
enzyme is responsible for the formation of pseudouridine from uracil-516 in 16S ribosomal RNA. 

- Escherichia coli hypothetical protein yciL and Hill 99, the corresponding Haemophilus influenzae protein. 
Escherichia coli hypothetical protein yjbC. 

- Escherichia coli hypothetical protein ymfC and HI0694, the corresponding Haemophilus influenzae protein 

- Aquifex aeollcus hypothetical protein AQ_554. 
Aquifex aeolicus hypothetical protein AQ_1 464. 
Bacillus subtilis hypothetical protein ypuL. 

- Bacillus subtilis hypothetical protein ytzF. 
Borrelia burgdorferi hypothetical protein BB0129. 

- Helicobacter pylori hypothetical protein HP1 459. 

- Synechocystis strain PCC 6803 hypothetical protein slr0361 . 

- Synechocystis strain PCC 6803 hypothetical protein slr061 2. 

[1 644] These are proteins of from 25 to 40 Kd which contain a number of consen/ed regions in their central section. 
They can be picked up in the database by the following highly conserved pattern. 

- Consensus pattern: G-R-L-D-x(2)-[STA]-x-G-[LIVFA]-[LIVMF](3)-[ST].[DNST] 

[1645] [ 1] Wrzesinski J.. Bakin A., Nurse K.. Lane B.G., Ofengand J. Biochemistry 34:8904-8913(1995). 
[1646] 724. Zinc finger present in dystrophin, CBP/p300 
ZZ in dystrophin binds calmodulin 
Putative zinc finger; binding not yet shown. 
[1647] 725. Zinc carboxypeptidase 

There are a number of different types of zinc-dependent carboxypeptidases (EC 3.4.17.-) [1.2]. All these enzymes 
seem to be structurally and functionally related. The enzymes that belong to this family are listed below. 

- Carboxypeptidase A1 (EC 34.17,1), a pancreatic digestive enzyme that can removes all C-terminal amino acids 
with the exception of Arg, Lys and Pro. 

- Carboxypeptidase A2 (EC 3.4.17.15), a pancreatic digestive enzyme with a specificity similar to that of carbox- 
ypeptidase A1, but with a preference for bulkier C-terminal residues. 

- Carboxypeptidase B (EC 3.4. 1 7.2), also a pancreatic digestive enzyme, but that preferentially removes C^erminal 
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Arg and Lys. 

- Carboxypeptidase N (EC 3.4.17.3) (also known as arginin© carboxypeptldase), a plasma enzyme which protects 
the body from potent vasoactive and inflammatory peptides containing C-termina! Arg or Lys (such as kinins or 
anaphylatoxins) which are released into the circulation. 

- Carboxypeptidase H (EC 3.4.17.10) (also known as enkephalin convertase or carboxypeptidase E). an enzyme 
located in secretory granules of pancreatic islets, adrenal gland, pituitary and brain. This enzyme removes residual 
C-terminal Arg or Lys remaining after initial endoprotease cleavage during prohormone processing. 

- Carboxypeptidase M (EC 34.17.12), a membrane bound Arg and Lys specific enzyme. 

It is ideally situated to act on peptide hormones at local tissue sites where it could control their activity before or after 
interaction with specific plasma membrane receptors. 

- Mast cell carboxypeptidase (EC 3.4.17.1). an enzyme with a specificity to carboxypeptidase A, but found in the 
secretory granules of mast cells. 

- Streptomyces griseus carboxypeptidase (Cpase SG) (EC 3.4.17.-) [3], which combines the specificities of mam- 
malian carboxypeptidases A and B. 

- Themioactinomyces vulgaris carboxypeptidase T (EC 34.17.18) (CPT) [4], which also combines the specificities 
of carboxypeptidases A and B. 

- AEBP1 [5], a transcriptional repressor active in preadipocytes. AEBP1 seems to regulate transcription by cleavage 
of other transcriptional proteins. 

- Yeast hypothetical protein YHR1 32c. 

[1648] All of these enzymes bind an atom of zinc. Three consen/ed residues are implicated in the binding of the zinc 
atom: two histidines and a glutamic acid Two signature patterns which contain these three zinc-ligands have been 
derived, 

- Consensus pattern: [PK].x-[LIVMFY]-x-[LIVMFY]-x(4)-H.[STAG]-x-E-x.[LIVM]-[STAG]-x(6)-[LIVMFYTA] [H and E 
are zinc ligands] 

- Consensus pattern: H-[STAG]-x(3)-[LIVME]-x(2)-[LIVMFYWl-P-[FYW] [H is a zinc ligand] 

[ 1] Tan F. Chan S.J., Steiner D.F., Schilling J.W„ Skidgel R.A, 
J. Biol. Chem. 264:13165-13170(1989). 

[ 2] Reynolds D.S.. Stevens R,L, Gurley D.S., Lane W.S., Austen K.F, 
Serafin W.E. 

J. Biol. Chem. 264:20094-20099(1989). 

[ 3] Narahashi Y 

J. Biochem. 107:879-886(1990). 

[ 4] Teplyakov A., Polyakov K., Obmolova G,. Strokopytov B.. Kuranova I.. 
Osterman A.L. Grishin N.V.. Smulevitch S.V., Zagnitko O.P., 
Galperina O.V.. Matz M. V.. Stepanov VM. 
Eur. J. Biochem. 208:281-288(1992). 
[ 5] He G.-P. Muise A.. Li A.W.. Ro H.-S. 
Nature 378:92-96(1995). 

[ 6] Hourdou M.-L., Guinand M., N^cheron M.J., Michel G., Denoroy L, 
Duez CM., Englebert S„ Joris B., Weber G., Ghuysen J.-M. 
Biochem. J. 292:563-570(1993). 
[ 7] Rawlings N.D., Barrett A.J. 
Meth. Enzymol. 248:183-228(1995). 

[1649] 726. Zinc finger. C2H2 type 

The C2H2 zinc finger is the classical zinc finger domain. 

The two conserved cysteines and histidines coordinate a zinc ion. The following pattern describes the zinc finoer 
#-X-C-X(1-5)-C-X3.#.X5-#-X2-H-X(3-6)-[H/C] 

Where X can be any amino acid, and numbers in brackets indicate the number of residues. The positions marked # 
are those that are important for the stable fold of the zinc finger. The final position can be either his or cys. 
The C2H2 zinc finger is composed of two short beta strands followed by an alpha helix. The amino terminal part of the 
helix binds the major groove in DNA binding zinc fingers. 

[1650] 'Zinc finger* domains [1 -5] are nucleic acid-binding protein structures first identified in the Xenopus transcrip- 
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tion factor TFIIIA. These cfonr^ins have since been found in numerous nucleic acid-binding proteins. A zinc finger 
domain is composed of 25 to 30 amino-acid residues. There are two cysteine or histidine residues at both extremities 
of the domain, which are involved in the tetrahedral coordination of a zinc atom. It has been proposed that such a 
domain interacts with about five nucleotides. A schematic representation of a zinc finger domain is shown below: 



X X 
X X 
X X 



X X 
X X 
X X 

C H 

X \ / X 
X ^ X 
X / \ X 

C H 

XXXXX XXXXX 



[1651] Many classes of zinc fingers are characterized according to the number and positions of the histidine and 
cysteine residues involved in the zinc atom coordination. In the first class to be characterized, called C2H2, the first 
pair of zinc coordinating residues are cysteines, while the second pair are histidines. A number of experimental reports 
have demonstrated the zinc-dependent DNA or RNA binding property of some members of this class. 
[1652] Some of the proteins known to include C2H2-type zinc fingers are listed below The number of zinc finger 
regions found in each of these proteins are indicated between brackets; a 'V symbol indicates that only partial sequence 
data is available and that additional finger domains may be present. 

- Saccharomyces cerevisiae: ACE2 (3). ADR1 (2). AZF1 (4). FZF1 (5), MIG1 (2), MSN2 (2). MSN4 (2) RGM1 (2) 
RIM1 (3). RME1 (3), SFP1 (2), SSL1(1), STP1 (3). SWI5 (3), VAC1 (1) and 2MS1 (2). 

- Emerlceila nidulans; brIA (2), creA (2). 

- Drosophila: AEF-1 (4). Cf2 (7), ci-D (5). Disconnected (2), Escargot (5), Glass (5). Hunchback (6), Kruppel (5) 
Kruppel-H (4+). Odd-skipped (4), Odd-paired (4), Pep (3), Snail (5). Spalt-major (7). Serependity locus beta (6).' 
delta (7), h-1 (8). Suppressor of hairy wing su(Hw) (12). Suppressor of variegation suvar(3)7 (5). Teashirt (3) and 
Tramtrack (2). 

- Xenopus: transcription factor TFIIIA (g). p43 from RNP particle (9), Xfin (37 !!), Xsna (5), gastrula XlcGFS.I to 
XICGF71.1 (from 4+ to 11+). Oocyte XlcOF2 to XlcOF22 (from 7 to 12). 

- Mammalian: basonuclin (6), BCL-6/LAZ-3 (6), erythroid krueppel-like transcription factor (3). transcription factors 
Spl (3). Sp2 (3). Sp3 (3) and Sp(4) 3. transcriptional repressor YY1 (4). Wilms* tumor protein (4). EGR1/Krox24 

(3) , EGR2/Krox20 (3), EGR3^Pilot (3), EGR4/AT1 33 (4). Evi-1 (10), GLI1 (5), GLI2 (4+), GLI3(3+), HIV-EP1/ZNF40 

(4) . HIV-EP2 (2), KR1 (9+), KR2 (9). KR3 (15+), KR4 (14+). KR5 (11+), HF.12 (6+), REX-1 (4), ZfX (13) ZfY (13) 
Zfp.35 (18), ZNF7 (1 5). ZNF8 (7), 2NF35 (10). ZNF42A4ZF-1 (1 3). ZNF43 (22), ZNF46/Kup (2). ZNF76 (7) ZNF91 
{36),ZNF133(3). 

[1653] In addition to the oonsen/ed zinc ligand residues it has been shown [6] that a number of other positbns are 
also important for the structural integrity of the C2H2 zinc fingers. The best cc^sen^ed position is found four residues 
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after the second cysteine; it is generally an aromatic or aliphatic residue. 

- Consensus pattern: C-x(2.4)-C.x(3)-[LIVMFYWC]-x(8)-H-x(3,5)-H [The two C's and two H's are zinc llgands] 

[1]Klug A,, Rhodes D. 

Trends Biochem. Sci. 12:464-469(1987). 

[ 2] Evans R.M.. Hollenberg S.M. 

Cell 52:1-3(1988). 

[ 3] Payre F.. Vincent A. 

FEBS Len. 234:245-250(1988). 

[ 4] Miller J.. McLachlan A.D., Klug A. 

EMBO J. 4:1609-1614(1985). 

[ 5] Berg J.M. 

Proc. Natl. Acad. Sci. U.S.A. 85:99-102(1988). 

[ 6] Rosenfeld R.. Margalit H. 

J. Biomol. Struct. Dyn. 11:557-570(1993). 

[1654] 727. Zinc finger, C3HC4 type (RING finger) 

A number of eukaryotic and viral proteins contain a conserved cysteine-rich domain of 40 to 60 residues (called C3HC4 
zinc-finger or 'RINGIinger) [1] that binds two atoms of zinc, and Is probably involved in mediating proteln^Drotein in- 
teractions. The 3D structure of the zinc ligation system is unique to the RING domain and is refered to as the "cross- 
brace" motif. The spacing of the cysteines in such a domain is C-x(2)-C-x(9 to 39)-C-x(1 to 3)-H-x(2 to 3)-C-xf2^C-x 
(4 to 48)-C-x(2)-C. / V / 

[1655] Proteins currently known to include the C3HC4 domain are listed below (references are only provided for 
recently determined sequences). 



Mammalian V(D)J recombination activating protein (gene RAG1). RAG1 activates the rearrangement of immu- 
noglobulin and T-cell receptor genes. 

Mouse rpt-1. Rpt-1 is a trans-acting factor that regulates gene expression directed by the promoter region of the 
in1erleukin-2 receptor alpha chain or the LTR promoter region of HIV-1 . 

Human rfp. Rfp is a developmentally regulated protein that may function in male germ cell development. Recom- 
bination of the N-terminal sectton of rfp with a protein tyrosine kinase produces the ret transforming protein. 
Human 52 Kd Ro/SS-A protein. A protein of unknown function from the Ro/SS-A ribonucleoprotein complex. Sera 
from patients with systemic lupus erythematosus or primary Sjogren's syndrome often contain antibodies that react 
with the Ro proteins. 

Human histocompatibility locus protein RING1. 

Human PML. a probable transcription factor. Chromosomal translocation of PML with retinoic receptor alpha cre- 
ates a fusion protein which is the cause of acute promyelocytic leukemia (APL). 
Mammalian breast cancer type 1 susceptibility protein (BRCA1) [El]. 
Mammalian cbl proto-oncogene. 
Mammalian bmi-1 proto-oncogene. 

Vertebrate CDK-activating kinase (CAK) assembly factor MAT1, a protein that stabilizes the complex between the 
CDK7 kinase and cyclin H (MAT1 stands for 'Menage A Trois.'). 

Mammalian mel-1 8 protein. Mel-18 whk:h is expressed in a variety of tumor cells is a transcriptional repressor that 
recognizes and bind a specific DN A sequence. 

Mammalian peroxisome assembly factor-1 (PAF-1) (PMP35), which is somewhat involved in the biogenesis of 
peroxisomes. In humans, defects in PAF-1 are responsible for a form of Zellweger syndrome, an autosomal re- 
cessive disorder associated with peroxisomal deficiencies. 
Human MAT1 protein, whk:h interacts with the CDK7-cyclin H complex. 
Human RING1 protein. 

Xenopus XNF7 protein, a probable transcription factor. 

Trypanosoma protein ESAG-8 (T-LR), which may be involved in the postranscriptional regulation of genes in VSG 
expression sites or may interact with adenylate cyclase to regulate Its activity. 

Drosophila proteins Posterior Sex COTbs (Psc) and Suppressor two of zeste (Su(z)2). The two proteins belong 
to the Polycomb group of genes needed to maintain the segment-specific repression of homeotic selector genes. 
Drosophila protein male-specific msl-2, a DNA-binding protein which is involved in X chromosome dosage com- 
pensation (the elevation of transcription of the male single X chrcwnosome). 
Arabidopsis thaliana protein COP1 which is involved in the regulation of photomorphogenesis. 
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- Fungal DNA repair proteins RAD5, RAD16, RAD18 and radS. 

- Herpesviruses trans-acting transcriptiwial protein ICP0/IE110. This protein which has been characterized in many 
different herpesviruses is a trans-activator and/or -repressor of the expression of many viral and cellular promoters 
Baculoviruses protein CG30. 

Baculovimses major immediate early protein (PE-38). 
Baculoviruses immediate-early regulatory protein IE-N/IE-2. 

- Caenorhabditis elegans hypothetical proteins F54G8.4, R05D3.4 and T02C1 . 1 . 

- Yeast hypothetical proteins YER116C and YKR017C. 

[1656] The central region of the domain was selected as a signature pattern for the C3HC4 finger 

- Consensus pattem: C-x-H-x-[LIVMFY]-C-x(2)-C-[LIVMYA] 

[1657] [ 1] Borden K.LB., Freemont RS. 

Curr. Opin. Struct. Biol. 6:395-401(1996). 

[1658] 728. Zinc finger C-x8-C-x5-C-x3-H type (and similar). 

[1659] 729. Zinc finger, CCHC class 

A family of CCHC zinc fingers, mostly from retroviral gag proteins (nucleocapsid). Prototype structure is from HIV 
Also contains members involved in eukaryotic gene regulation, such as C. elegans GLH-1 . 
Structure is an 18-residue zinc finger; no examples of indels in the alignment. 
[1660] 730. Zn-finger in Ran binding protein and others. 
[1661] 731. ANMike Zinc finger 

[1662] Zinc finger at the C-terminus of An1 Swiss:Q9l889 . a ubiquitin-like protein in Xenopus laevis The following 

pattem describes the zinc finger. C-X2-C-X(9-12)-C-X(1-2)-C-X4-C-X2-H.X5-H-X-C Where X can be any amino acid. 

and numbers in brackets indicate the number of residues. 

[1663] [1] Linnen JM, Bailey CP, Weeks DL; Gene 1993; 128- 181 -188 

[1664] 732. 14-3-3 proteins 

Structure of a 14-3-3 protein and implications for coordination of multiple signalling pathways. 

Xiao B, Smerdon SJ, Jones DH, Dodson GG. Soneji Y, Aitken A, Gamblin SJ; Nature 1995;376:188-191. 

Crystal structure of the zeta isofomn of the 1 4-3-3 protein. 

Liu D, Bienkowska J, Petosa C, Collier RJ, Fu H, Liddington R; 

Nature 1995;376:191-194. 

[1665] Interaction of 14-3-3 with signaling proteins is mediated by the recognition of phosphoserine 
Muslin AJ, Tanner JW, Allen PM, Shaw AS; 
Cell 1996;84:889-897. 

[1666] The 14-3-3 protein binds its target proteins with a common site located towards the C-terminus. 
Ichimura T, Ito M, Itagaki C, Takahashi M, Horigome T. Omata S, Ohno S Isobe T 
FEBS Lett 1 997;41 3:273-276. 

[1667] Molecular evolution of the 14-3-3 protein family 

Wang W, Shakes DC 

J Mol Evoi 1996;43:384-398. 

Function of 14-3-3 proteins. 

Jin DY, Lyu MS, Kozak CA. Jeang KT 

Nature 1996;382:308-308. 

[1668] The 1 4-3-3 proteins [1 ,2.3] are a family of closely related acidic honrKxJimeric proteins of about 30 Kd which 
were first identified as being very abundant in mammalian brain tissues and located preferentially in neurons The 
14-3-3 proteins seem to have multiple biological activities and play a key role in signal transduction pathways and the 
cell cycle. They interacts with kinases such as PKC or Raf-1 ; they seem to also function as protein-kinase dependent 
activators of tyrosine and tryptophan hydroxylases and in plants they are associated with a complex that binds to the 
G-box promoter elements. 

[1669] The 14-3-3 family of proteins are ubiquitously found in all eukaryotic species studied and have been se- 
quenced in fungi (yeast BMH1 and BMH2, fission yeast rad24 and rad25). plants, Drosophila. and vertebrates The 
sequences of the 14-3-3 proteins are extremely well conserved. Two highly consented regions have been selected as 
signature patterns: the first is a peptide of 11 residues located in the N-terminal section; the second, a 20 amino acid 
region located in the C-terminal section. 

- Consensus pattem: R-N-L-[UV]-S-[VG]-[GA]-Y-[KN]-N-[IVA] 

- Consensus pattem: Y-K-(DE1.S-T.L.|-[IM]-Q-L-[LF1-IRHC]-D-N-[LF]-T-[LS]-W-[TAN]-[SAD] 
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[1]Aitken A. 

Trends Biochem. Sci. 20:95-97(1995). 

[ 2] Morrison D. 

Science 266:56-57(1994). 

[ 3] Xiao B.. Smerdon S.J., Jones D.H.. Dodson G.G., Soneji Y, Aitken A.. Gamblin S J 
Nature 376:188-191(1995). 

[1670] 733. D-isomer specific 2-hydroxyacid dehydrogenases (2 Hacid DH) 

This Ram covers the Formate dehydrogenase, D-glycerate dehydrogenase and D-lactate dehydrogenase families in 
SCOP. A number of NAD-dependent 2-hydroxyacid dehydrogenases which seem to be specific for the D-isomer of 
their substrate have been shown [1 .2,3,4] to be functionally and structurally related. These enzymes are listed below. 

- D-lactate dehydrogenase (EC 1 . 1 .1 .28). a bacterial enzyme which catalyzes the reduction of D-lactate to pyruvate. 

- D-glycerate dehydrogenase (EC 1 . 1 . 1 .29) (NADH-dependent hydroxypyruvate reductase), a plant leaf peroxiso- 
mal enzyme that catalyzes the reduction of hydroxypyruvate to glycerate. This reaction is part of the giycolate 
pathway of photorespiration. 

- D-glycerate dehydrogenase from the bacteria Hyphomicrobium methylovorum and Methylobacterium extorquens. 

- 3-phosphoglycerate dehydrogenase (EC 1.1 .1.95). a bacterial enzyme that catalyzes the oxidation of D-3-phos- 
phoglycerate to 3-phosphohydroxypyruvate. This reaction is the first committed step in the 'phosphorylated' path- 
way of serine biosynthesis. 

- Erythronate-4-phosphate dehydrogenase (EC 1 . 1 .1 .-) (gene pdxB), a bacterial enzyme involved in the biosynthesis 
of pyridoxine (vitamin B6). 

- D-2-hydroxyisocaproate dehydrogenase (EC 1 .1 . 1 .-) (D-hicDH). a bacterial enzyme that catalyzes the reversible 
and stereospecific interconversion between 2-ketocarboxylic acids and D-2-hydroxy-carboxylic acids. 

- Formate dehydrogenase (EC 1 .2.1 .2) (FDH) from the bacteria Pseudomonas sp. 101 and various fungi [5), 

- Vancomycin resistance protein vanH from Enterococcus faecium; this protein is a D-specific alpha-keto acid de- 
hydrogenase involved in the fomnation of a peptidoglycan which does not temiinate by D^alanine thus preventing 
vancomycin binding. 

Escherichia coli hypothetical protein ycdW. 
Escherichia coli hypothetical protein yiaE. 

- Haemophilus influenzae hypothetical protein HI1556. 

- Yeast hypothetical protein YER081 w. 
Yeast hypothetical protein YIL074w. 

[1671] All these enzymes have similar enzymatic activities and are structurally related. Three of the most conserved 
regions of these proteins have been selected to develop patterns. The first pattern is based on a glycine-rich region 
located in the central section of these enzymes; this region probably corresponds to the NAD-binding domain. The two 
other patterns contain a number of consented charged residues, some of which may play a role in the catalytic mech- 
anism. 

- Consensus pattern: [LIVMA]-[AG]-[l\n"HLIVMFY]-[AG]-x-G-[NHKRQGSAC].[LM-G-x(13,14)-rLIVfMTl-x(^^^^^ 
wCTH]-[DNSTK] ^ 

- Consensus pattern: [UVMFYWA]-[LIVFYWC]-x(2)-[SAC]-[DNQHRHIVFA]-[UW]-x-[UVFI-[HNI]-x-P-x( 
x(2)-[LIVMF]-x.[GSDN] ^ ^ 

- Consensus pattem: lLMFATC]-[KPQ]-x-[GSTDN]-x-{LIVMFYWR]-[LIVMFYW](2)-N-x-[STAGC]-R-fGPl-x-fLIVH 
[LIVMC]-[DNV] ^ 

[1] Grant G.A. Biochem. Biophys. Res, Commun. 165:1371-1374(1989). 

[2] Kochhar S., Hunziker P, Leong-Morgenthaler PM., Hottinger H. Biochem. Biophys. Res. Commun 184*60-66 
(1992). 

[3] Ohta T, Taguchi H. J. Bbl. Chem. 266:12588-12594(1991). 

[4] Goldberg J.D., Yoshida T, Brick P J. IVbl. Biol. 236:1123-1140(1994). 

[5] Popov V.O. Lamzin VS. Biochem. J. 301:625-643(1994). 

[1672] 734. 2-oxo acid dehydrogenases acyltransf erase (catalytk; domain) 

Refined crystal stmcture of the catalytic dc^nain of dihysrolipoyi transacetylase (E2P) from azotobacter vineelandii at 
2.6 angstroms resolution, 

Mattevi A, Obmolova G, Kalk KH, Westphal AH. De Kok A. Hoi WG; 
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10 



J Mol Bbl 1993;230:1183-1199. 

These proteins contain one to three copies of a iipoyi binding donnain foltowed by the catalytic domain. 
[1673] 735. 3-beta hydroxysterkxJ dehydrogenase/isomerase family 
Structure and tissue-specific expression of 3 

beta-hydroxysteroid dehydrogenase/5-ene-4-ene isomerase genes in human and rat classical and peripheral ster- 
oidogenic tissues. 

Labrie F, Simard J, Luu-The V. Pelletier G, Belanger A, 
Lachance Y, Zhao HF, Labrie C. Breton N, de Launoit Y, et a! 
J Steroid Biochem Mol Biol 1992;41:421-435. 

The enzyme 3 beta-hydroxysteroid dehydrogenase/5-ene-4-ene isomerase (3 beta-HSD) catalyzes the oxidation and 
isomerization of 5-ene-3 beta-hydroxypregnene and 5-ene-hydroxyandrostene steroid precursors into the correspond- 
ing 4-ene-ketosteroids necessary for the formation of all classes of steroid hormones. 
[1 674] 736. 3-hydroxyacyl-CoA dehydrogenase 
This family also includes lambda crystallin. 
IS Structure of L-3-hydroxyacyl-coenzyme A dehydrogenase: 
preliminary chain tracing at 2.8-A resolution. 
Birktoft JJ, Holden HM. Hamlin R. Xuong NH. Banaszak U; 
Proc Natl Acad Sci U S A 1987;84:8262-8266. 

[1 675] 3-hydroxyacy l-CoA dehydrogenase (EC 1 . 1 . 1 . 35) (HCDH) [1 ] is an enzyme involved in fatty acid metabolism 
It catalyzes the reduction of 3-hydroxyacyl-CoA to 3-oxoacyl-CoA. Most eukaryotic cells have 2 fatty-acid betanoxidation 
systems, one located in mitochondria and the other in peroxisomes. In peroxisomes 3-hydroxyacyl-CoA dehydrogenase 
forms, with enoyl-CoA hydratase (ECH) and 3,2-trans-enoyl-CoA isomerase (ECl) a multifunctional enzyme where the 
N-terminal domain bears the hydratase/isomerase activities and the C-terminal domain the dehydrogenase activity. 
There are two mitochondrial enzymes: one whk;h is monofunctional and the other which is. like its peroxisomal coun- 
25 terpart, multifunctional. 

[1676] In Escherichia coli (gene fadB) and Pseudomonas tragi (gene faoA) HCDH is part of a multifunctional enzyme 
which also contains an ECH/ECI domain as well as a 3-hydroxybutyryl-CoA epimerase domain [2]. 
[1677] The other proteins structurally related to HCDH are: 



20 



30 



3$ 



40 



- Bacterial 3-hydroxybutyryl-CoA dehydrogenase (EC 1.1.1.157) which reduces 3-hydroxybutanoyl-CoA to ace- 
toacetyl-CoA [3]. 

- Eye lens protein lambda-crystallin [4], which is specific to lagomorphes (such as rabbit). 

There are two major region of similarities in the sequences of proteins of the HCDH family, the first one located in the 
N-terminal, corresponds to the NAD-blnding site, the second one is located In the center of the sequence. A signature 
pattern has been derived from this central region. 

- Consensus pattem: [DNE]-x(2)-[GAl-F-[LIVMFY]-x-[NT]-R-x(3)-[PA].[LIVMFY](2)-x(5)-[LIVMFYCTl-[UVMFYl-x 
(2)-[GV] ^ 

[ 1J Birktoff J.J., Holden H.M., Hamlin R.. Xuong N.-H.. Banaszak L.J. Proc. Natl. Acad. Sci U S A 84 8262-8266 
(1987). 

[ 2] Nakahigashi K., Inokuchi H. Nucleic Acids Res. 18:4937-4937(1990). 

[ 3] Mullany P, Clayton C.L, Pallen M.J., Slone R., Al-Saleh A., Tabaqchall S. FEMS Microbiol Lett 124*61-67 
^ (1994). 

[4] Mulders J.W.M., Hendrlks W.. Blankesteijn W.M.. Bloemendal H.. de Jong WW J Biol Chem 263* 
15462-15466(1988). 

[1678] 737. 60s Acidic ribosomal protein 
50 Proteins PI , P2. and PO. components of the eukaryotic 

ribosome stalk. New structural and functional aspects. 

Remacha M, Jimenez-Diaz A, Santos C. Briones E. Zambrano R. 

Rodriguez Gabriel MA, Guarlnos E. Ballesta JP; 

Biochem Cell Bbl 1995;73:959-968. 
55 This family includes archaebacterial LI 2, eukaryotic PO. Pi and P2. 

[1679] 738. 6-phosphogluconate dehydrogenases 

6-phosphogluconate dehydrogenase (EC 1.1.1.44) (6PGD) catalyzes the third step in the hexose monophosphate 
shunt, the decarboxylattng reduction of 6-phosphogluconate In to ribulose 5-phosphate. 
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[1 680] Pfokaryot ic and eukaryotic 6PGD are proteins of about 470 amino acids whose sequence are highly conserved 
[1]. A region which has been shown [2], from studies of the sheep 6PGD tertiary structure, to be involved in the binding 
of 6-phosphogluconate has been selected as a signature pattern. 

- Consensus pattern: [LIVM]-x-D-x(2)-(GA]-[NQS]-K-G-T-G-x-W 

[ 1] Reizer A., Deutscher J., Saier M.H. Jr., Reizer J. 
Mol. Microbiol. 5:1081-1089(1991). 

[ 2] Adams M.J.. Archibald I.G.. Bugg C.E.. Came A., Gover S., 
Helliwell J.R, Pickersgill R.W., White S.W. 
EMBO J. 2:1009-1014(1983). 



[1681] 739. (7tm 1 ) G-protein coupled receptors [1 to 4,E 1 .E2] (also called R7G) are an extensive group of hormones, 
neurotransmitters, odorants and light receptors which transduce extracellular signals by interaction with guanine nu- 
cleotide-binding (G) proteins. The receptors that are currently known to belong to this family are listed below. 

- 5-hydroxytryptamine (serotonin) 1 A to 1 F. 2A to 2C, 4, 5A, 58, 6 and 7 [5]. 

- Acetylcholine, muscarlnic-type. Ml to M5. 

- Adenosine A1 , A2A, A2B and A3 [6]. 

- Adrenergic alpha-1 A to -1 C; alpha-2A to -2D; beta-1 to -3 [7]. 

- Angiotensin II types I and II. 
Bombesin subtypes 3 and 4. 
Bradykinin B1 and B2. 

c3a and C5a anaphylatoxin. 
Cannabinoid CB1 and CB2. 

- Chemokines C-C CC-CKR-1 to CC-CKR-8. 

- Chemokines C-X-C CXC-CKR-1 to CXC-CKR-4. 
Cholecystokinin-A and cholecystokinin-B/gastrin. 
Dopamine D1 to D5 [8]. 

- Endothelin ET-a and ET-b [9]. 

- fMet-Leu-Phe (fMLP) (N-formyl peptide). 

- Follicle stimulating hormone (FSH-R) [10]. 
Galanin. 

Gastrin-releasing peptide (GRP-R). 

- Gonadotropin-releasing hormone (GNRH-R). 

- Histamine HI and H2 (gastric receptor I). 

- Lutropin-choriogonadotropic hormone (LSH-R) [1 0]. 

- Melanocortin MC 1 R to MC5R. 
Melatonin. 

- Neuromedin B (NMB-R). 

- Neuromedin K (NK-3R). 
Neuropeptide Y types 1 to 6. 
Neurotensin (NT-R). 
Octopamine (tyramine), from insects. 

- Odorants [11]. 

- Opioids delta-, kappa- and mu-types [1 2]. 

- Oxytocin (OT-R). 

Platelet activating factor (PAF-R). 

Prostacyclin. 

Prostaglandin D2. 

- Prostaglandin E2, EPI to EP4 subtypes. 
Prostaglandin F2. 
Purinoreceptors (ATP) [1 3]. 
Somatostatin types 1 to 5. 

- Substance-K (NK-2R). 

- Substance-P(NK-IR). 
Thrombin. 

- Thromboxane A2. 
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Thyrotropin (TSH-R) [10]. 
Thyrotropin releasing factor (TRH-R), 
Vasopressin Via, V1b and V2. 
Visual pigments (opsins and rhodopsin) [14]. 
Proto-oncogene mas. 

A number of orphan receptors (whose ligand is not known) from mammals and birds. 
Caenorhabditis elegans putative receptors C06G4.5, C38C10.1, C43C3.2. T27D1.3and ZC84.4. 
Three putative receptors encoded in the genome of cytomegalovirus: US27, US28, and UL33. 
ECRF3. a putative receptor encoded in the genome of herpesvirus saimiri. 

[1 682] The stmcture of all these receptors is thought to be identical. They have seven hydrophobic regbns. each of 
which most probably spans the membrane. The N-temiinus is located on the extracellular side of the membrane and 
is often glycosylated, while the C-terminus is cytoplasmic and generally phosphorylated. Three extracellular loops 
alternate with three intracellular loops to link the seven transmembrane regions. Most, but not all of these receptors, 
lack a signal peptide. The most consen/ed parts of these proteins are the transmembrane regions and the first two 
cytoplasmic loops. A consen/ed acidic-Arg-aromatic triplet is present in the N-terminal extremity of the second cyto- 
plasmic loop [15] and could be implicated in the interaction with G proteins. 

[1 683] To detect this widespread family of proteins, a pattern that contains the conserved triplet and that also spans 
the major part of the third transmembrane helix has been developed. 

. Consensuspattem:[GSTALIVMFYWC]-[GSTANCPDE]-{EDPKRH}-x(2)-[UVMNQGA]-x(2)-[LIVMFT14GSTANC^^ 
[LIVMFYWSTAC]-[DENH]-R-[FYWCSH].x(2HUVM] 

[ 1]Strosberg AD. 

Eur. J. Biochem. 196:1-10(1991). 

[2] Kerlavage A.R. 

Curr. Opin. Struct. Biol. 1:394-401(1991). 

1 3] Probst W.C.. Snyder LA, Schuster D.L, Brosius J.. Sealfon S.C. 

DNACell Biol. 11:1-20(1992). 

[ 4] Savarese TM., Fraser C M. 

Biochem. J. 283:1-9(1992). 

[ 5] Branchek T. 

Curr. Biol. 3:315-317(1993). 

[6] Stiles G.L 

J. Biol. Chem. 267:6451-6454(1992). 

[ 7] Friell T, Kobilka B.K., Lefkowitz R.J., Caron M.G. 

Trends Neurosci. 11:321-324(1988). 

[8] Stevens C.R 

Curr. Biol. 1:20-22(1991). 

[ 9] Sakurai T, Yanagisawa M.. Masaki T 

Trends Pharmacol. Sci. 13:103-107(1992). 

[10] Salesse R., Remy J.J.. Levin J.M.. Jallal B., Gamier J. 

Biochimie 73:109-120(1991). 

[11] Lancet D., Ben-Arie N. 

Curr. Biol. 3:668-674(1993). 

[12] Uhl G.R., Childers S.. Pasternak G. 

Trends Neurosci. 17:89-93(1994). 

[13] Barnard E.A, Burnstock G.. Webb TE. 

Trends Phamnacol. ScL 15:67-70(1994). 

[14] Applebury M.L, Hargrave RA 

Vision Res. 26:1881-1895(1986). 

[15] Attwood TK.. Eliopoulos E.E., Findlay J.B.C. 

Gene 98:153-159(1991). 

[1684] (7tm 1 ) Visual pigments (opsins) retinal binding site 

Visual pigments [1.2] are the light-absorbing molecules that mediate vision. They consist of an apoprotein, opsin, 
covalently linked to the chromophore cis-retinal. Vision is effected through the absorption of a photon by cis-retinal 
whrch is isomerized to trans-retinal. This isomerizatbn leads to a change of conformation of the protein. Opsins are 
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integral membrane proteins with seven transmembrane regions that belong to family 1 of G-protein coupled receptors. 
[1685] In vertebrates four different pigments are generally found. Rod cells, which mediate vision in dim light, contain 
the pigment rhodopsin. Cone cells, which functbn In bright light, are responsible for color visksn and contain three or 
more color pigments (for example, in mamrrals: red, blue and green). 
5 [1686] In Drosophila, the eye is composed of 800 facets or ommatidia. Each ommatidium contains eight photore- 
ceptor cells (R1 -RB): the R1 to R6 cells are outer cells. R7 and R8 inner cells. Each of the three types of cells (R1 -R6, 
R7 and R8) expresses a specific opsin. 

[1687] Proteins evolutionary related to opsins include squid retinochrome, also known as retinal photoisomerase. 
which converts various isomers of retinal into 11-cis retinal and mammalian retinal pigment epithelium (RPE) RGR [3], 
10 a protein that may also act In retinal isomerization. 

[1688] The attachment site for retinal in the above proteins is a conserved lysine residue in the middle of the seventh 
transmembrane helix. The pattern that had been developed includes this residue. 

- Consensus pattern: ILIVMWAC]-(PGCl-x(3)-[SAC]-K-[STAUMRHGSACPNV]-[STACP]-x(2)-[DENFHAP]-x(2)- 
15 [lY) 

[K is the retinal binding site] 

[ 1] Applebury M.L., Hargrave PA. 
Vision Res. 26:1881-1895(1986). 
20 [ 2] Fryxell K.J., Meyerowitz E.M. 

J- Mol. Evol. 33:367-378(1991). 

[ 3] Shen D., Jiang M., Hao W., Tao L., Salazar M.. Fong H.K.W. 
Biochemistry 3313117-13125(1994). 

25 [1689] The following descriptions of protein family functions are not provided by the Ram or Prosite databases. 
[1690] 740. BAH 

BAH donnaln. Number of members: 65 

[1] Medline: 97074677. Molecular cloning of polybromo, a nuclear protein containing multiple domains including 
30 five bromodomalns, a truncated HMG-box, and two repeats of a novel domain. Nicolas RH, Goodwin GH; Gene 

1996;175:233-240. 

[2] Medline: 99198739. The BAH (bromo-adjacent homology) domain: a link between DNA methylation. replication 
and transcriptional regulation. Callebaut I. Courvalln J-C, Mornon JP; FEBS lefts 1999;446:189-193. 

35 [1691] 741.ELM2. 

ELM2 domain. The ELM2 (Egl-27 and MTA1 homology 2) domain is a small domain of unknown function. Number of 
members: 10 

[1 692] 742. Euk proin. EUKARYOTIC.PORIN The major protein of the outer mitochondrial membrane of eukaryotes 
Is a porin that forms a voltage-dependent anion-selective channel (VDAC) that behaves as a general diffusion pore 
40 for small hydrophlllc molecules [1 to 4]. The channel adopts an open confomnatlon at low or zero membrane potential 
and a closed conformation at potentials above 30-40 mV. 

This protein contains about 280 amino acids and its sequence Is composed of between 1 2 to 1 6 beta-strands that span 
the mitochondrial outer membrane. Yeast contains two members of this family (genes POR1 and POR2); vertebrates 
have at least three members (genes VDAC1 , VDAC2 and VDAC3) [5]. 
45 A conserved region located at the C-termlnal part of these proteins was selected as a signature pattern. 

Consensus pattern[YHl-x(2)-D-[SPCAD]-x-[STA]-x(3)-[TAG]-[KR]-[LIVMF]-[DNSTA]-[DNS]-x(4)-[GSTAN]-[LIVMA]-x- 
[LIVMY] 

[ 1] Benz R. Biochlm. Biophys. Acta 1197:167-196(1994). 
50 [ 2] Manella C.A. Trends Biochem. Sci. 17:315-320(1992). 

[ 3] Dihanich M. Experientia 46:146-153(1990). 

1 4] Forte M.. Guy H.R., Mannella C.A. J. Bbenerg. Biomembr. 19:341-350(1987). 

[ 5] Sampson M.J.. Lovell R.S., Davison D.B.. Craigen W.J, Genomics 36:192-196(1996). 

55 [1693] 743. Glycohydor 19 
Chitinases family 19 signatures 

cross-reference(s) CHITINASE_19_1, CHITINASE_19_2 

Chitinases (EC 3.2. 1 . 1 4) [1 ] are enzymes that catalyze the hydrolysis of the beta-1 ,4-N-acetyl-D-glucosamine linkages 
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in chitin polymers. From the view point of sequence similarity chitinases belong to either family 18 or 1 9 In the classi- 
fication of glycosyl hydrolases [2,E1 ]. Chitinases of family 1 9 (also known as classes I A or I and IB or II) are enzymes 
from plants that function. In the defense against fungal and insect pathogens by destroying their chitin^ontaining eel! 
wall. Class lA/l and IB/II enzymes differ in the presence (lA/I) or absence (IB/II) of a N-terminal chitin-binding domain 
(see the relevant entry <PDOC00025>). The catalytic domain of these enzymes consist of about 220 to 230 amino 
acid residues. 

Two highly conserved regions were selected as signature pattems, the first one is located in the N-terminal section 
and contains one of the sk cysteines which are conserved in most, if not all. of these chitinases and which is probablv 
involved in a disulfide bond. 

Consensus pattemC-x(4.5)-F-Y-[ST]-x(3)-(FY]-[LIVMF]-x-A-x(3)-[YF].x(2)-F-[GSA] 
[1694J Consensus pattem[LIVM]-[GSA]-F-x-[STAG](2)-[LIVMFY]-W-[FY]-W-[LIVM] 

[ 1]Flach J„ Pilet P.-E., Jolles R Experientia 48:701-716(1992). 
[ 2] Henrissat B. Biochem. J. 280:309-316(1991). 

[1695] 744. MBD 
Methyl-CpG binding domain 

The Methyl-CpG binding domain (MBD) binds to DNA that contains one or more symmetrically methylated CpGs [1] 
DNA methylation in animals is associated with alterations in chromatin structure and silencing of gene expression 
MBD has negligible non-specific affinity for DNA. In vitro foot-printing with MeCP2 showed the MBD can protect a 12 
nucleotide region surrounding a methyl CpG pair [1 ]. MBDs are found in several Methyl-CpG binding proteins and also 
DNA demethylase [2]. Number of members: 11 

[IJMedline: 94232813. Dissection of the methyl-CpG binding domain from the chromosomal protein MeCP2 Nan 
X, Meehan RR, Bird A; Nucleic Acids Res 1993;21:4886-4892. 

[2]Medline: 991 581 38. A mammalian protein with specific demethylase activity for mCpG DNA. Bhattacharya SK 
Ramchandani S, Cervoni N. Szyf M; Nature 1999;397:579-583. 

[1696] 745. Peptidase CI 

Eukaryotic thbl (cysteine) proteases active sites 

cross-references) THIOL_PROTEASE_CYS; THIOL_PROTEASE_HIS- 

THIOL_PROTEASE_ASN 

Eukaryotic thiol proteases (EC 3.4.22.-) [1] are a family of proteolytic enzymes which contain an active site cysteine 
Catalysis proceeds through a thioester intemiediate and is facilitated by a nearby histidine side chain- an asparagine 
completes the essential catalytic triad. The proteases which are currently known to belong to this family are listed betow 
(references are only provided for recently determined sequences). 

- Vertebrate lysosomal cathepsins B (EC 3.4.22.1). H (EC 34.22,16), L (EC 34.22.15). and S (EC 3 4 22 27) [21 

- Vertebrate lysosomal dipeptidyl peptidase I (EC 34. 1 4. 1 ) (also known as cathepsin C) [2]. 

- Vertebrate calpains (EC 3.4.22. 1 7). Calpains are intracellular calcium- activated thiol protease that contain both 
a N-terminal catalytic domain and a C-terminal calcium-binding domain. 

- Mammalian cathepsin K. which seems involved in osteoclastic bone resorption [3]. 
Human cathepsin O [4]. 

- Bleomycin hydrolase. An enzyme that catalyzes the inactivation of the antitumor drug BLM (a glycopeptide) 

- Plant enzymes: barley aleurain (EC 3.4.22. 1 6). EP-B1/B4; kidney bean EP-C1 . rice bean SH-EP; kiwi fruit actinidin 
(EC 3.4.22.14); papaya latex papain (EC 34.22.2). chymopapain (EC 3.4.22.6). caricain (EC 3.4.22.30), and pro- 
teinase IV (EC 3.4.22.25); pea turgor-responsive protein 15A; pineapple stem bromelain (EC 34.22.32); rape 
COT44; nee oryzain alpha, beta, and gamma; tomato low-temperature induced. Arabidopsis thaliana A494 RD1 9A 
andRD21A. 

House-dust mites allergens DerPI and EurMi . 

- Cathepsin B-like proteinases frc^ the worms Caenorhabditis elegans (genes gcp-1, cpr-3, cpr^, cpr-5 and cpr- 
6), Schistosoma mansoni (antigen SM31 ) and Japonica (antigen SJ31 ), Haemonchus contortus (genes AC-1 and 
AC-2), and Ostertagia ostertagi (CP-1 and CP-3). 

Slime tno\6 cysteine proteinases CPl and CP2. 
Cruzipain from Trypanosoma cruzi and bmcei. 

Throphozoite cysteine proteinase (TCP) from varbus PlasnruxJium species. 
Proteases from Leishmania mexbana. Theileria annulata and Theileria parva. 
Baculoviruses cathepsin-like Enzyme (v-cath). 
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- Drosophila small optic lobes protein (gene sol), a neuronal protein that contains a calpain-iike domain 

- Yeast thbl protease BLHIA^CPI/LAPS. 

- Caenorhabditis elegans hypothetical protein C06G4.2, a calpain-like protein. 
[1 697] Two bacterial peptidases are also part of this famity: 

- Aminopeptidase C from Lactococcus lactis (gene pepC) [5]. 
Thiol protease tpr from Porphyrorrranas gingival is. 

[1 698] Three other proteins are structurally related to this family, but may have lost their proteolytic activity. 

- Soybean oil body protein P34. This protein has its active site cysteine replaced by a glycine. 

- Rat testin, a Sertoli cell secretory protein highly similar to cathepsin L but with the active site cysteine is replaced 
by a serine. Rat testin should not be confused with mouse testin which is a LIM^domain protein (see 
<PDOC00382>). 

- Plasmodium falciparum serine-repeat protein (SERA), the major blood stage antigen. This protein of 111 Kd pos- 
sesses a C-terminal thiol-protease-like domain [6], but the active site cysteine is replaced by a serine. 

The sequences around the three active site residues are well conserved and can be used as signature patterns 
[1699] Consensus pattemQ-x(3)-[GE]-x-C-[YW]-x(2)-[STAGC]-[STAGCV] [C is the active site residue] 
Note the residue in position 4 of the pattern is almost always cysteine; the only exceptions are calpains (Leu), bleomycin 
hydrolase (Ser) and yeast YCP1 (Ser). Note the residue in position 5 of the pattern is always Gly except in papaya 
protease IV where it is Glu. Consensus pattern[LIVMGSTAN]-x.H-[GSACE]-[LIVM]-x-[LlVMAT](2)-G-x-[GSADNH] [H 
is the active site residue] 

Consensus pattem[FYCH]-[Wl]-[H\n-]-x-[KRQAG]-N4ST]-W-x(3)-[FYW]-G-x(2)-G-[LFYVV^ [n 
is the active site residue] 

Note these roteins belong to family C1 (papain-type) and C2 (calpains) in the classification of peptidases [7.E1]. 
[ 1]Dufour E. Biochimie 70:1335-1342(1988). 

[ 2]Kirschke H.. Barrett A.J.. Rawlings N.D. Protein Prof. 2:1587-1643(1995). 

[ 3]Shi G.-P, Chapman H.A.. Bhairi S.M., Deleeuw C, Reddy V.Y. Weiss S.J. FEBS Lett. 357:129-134(1995). 
[ 4]Velasco G.. Ferrando A.A., Puente X.S.. Sanchez L.M.. Lopez-ain C. J. Biol. Chem. 269:27136-27142(1994). 
[5]Chapot-Chartier M.P. Nardi M., Chopin M.C., Chopin A., Gripon J.C. Appl. Environ. Microbiol 59-330-333 
(1993). 

[ 6]Higgins D.G,, McConnell D.J.. Sharp PM. Nature 340:604-604(1989). 
[7]Rawlings N.D., Barrett A.J. Meth. Enzymol. 244:461-486(1994). 

[1700] 746. Peptidase M22 

Glycoprotease family signature cross-reference(s) GLYCOPROTEASE 

Glycoprotease (GCP) (EC 3.4.24.57) [1 ], or o-syaloglycoprotein endopeptidase. is a metalloprotease secreted by Pas- 
teurella haemolytica which specifically cleaves O-sialoglycoproteins such as glycophorin A. The sequence of GCP is 
highly similar to the following uncharacterized proteins: 



Escherichia coli hypothetical protein ygjD (ORF-X). 
Bacillus subtllis hypothetical protein ydiE. 
Mycobacterium leprae hypothetical protein U229E. 
Mycobacterium tuberculosis hypothetical protein MtCY78.10, 
Synechocystis strain PCC 6803 hypothetical protein slrO807. 
Methanococcus jannaschii hypothetical protein MJ1130- 
Haloarcula marismortui hypothetical protein in HSH 3'region. 
Yeast hypothetical protein YKR038c. 
Yeast hypothetical protein QRI7- 



[1701] One of the conserved regions contains two consented histidines. It is possible that this region is involved ii 
coordinating a metal bn such as zinc. 

[1702] Consensus pattemIKR]-[GSAT]-x(4)-[FYWLH].[EXaNGK].x-P-x.IUVMFYl-x(3)-H-x(2)-[AG]-H-[LIVM] 
Note these proteins belong to family M22 in the classification of peptidases [2,E1]. 
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[ 1]Abdullah K.M.. Lo RYC, Mellors A. J. BacterioL 173:5597-5603(1991). 
[ 2]Raw!ings N.D.. Barrett A. J. Meth, Enzymol. 248:183-228(1995). 

[1703] 747. SAM. SAM domain (Sterile alpha motif) 

It has been suggested that SAM is an evolutionarily conserved protein binding donnain that is involved in the regulation 
of numerous developmental processes in diverse eukaryotes. The SAM domain can potentially function as a protein 
interaction module through its ability to homo- and heterooligomerise with other SAM domains. Number of members: 81 

[1]Medline: 96100659 SAM: A novel motif in yeast sterile alpha and Drosophila polyhomeotic proteins Pontinq CP* 
Prot Sci 1 995;4:1 928-1 930. ' 
[2]Medline: 971 60498 SAM as a protein interaction domain involved in developmental regulation ShulU J Ponting 
CP. Hofmann K, Bork P; Prot Sci 1 997;6:249-253. 

[3]Medline: 99101382 The crystal structure of an Eph receptor SAM domain reveals a mechanism for modular 
dimerization. Reference Author Stapleton D, Balan I. Pawson T Sicherl F; Nat Struct Biol 1999;6:44-49. 

[1704] 748. Tyrosinase signatures cross-ref erence(s) TYROSINASE_1 ; TYROSINASE_2 Tyrosinase (EC 1 .14, 1 8. 1 ) 
[1] is a copper monooxygenases that catalyzes the hydroxylation of monophenols and the oxidation of OKiiphenols to 
o-quinols. This enzyme, found in prokaryotes as well as in eukaryotes, is involved in the formation of pigments such 
as melanins and other polyphenolic compounds. 

[1705] Tyrosinase binds two copper ions (CuA and CuB). Each of the two copper ion has been shown [2] to be bound 
by three conserved histidines residues. The regions around these copper-binding ligands are well conserved and also 
shared by some hemocyanins, which are copper^ontaining oxygen carriers from the hemolymph of many molluscs 
and arthropods [3,4]. 

[1706] At least two proteins related to tyrosinase are known to exist in mammals: 

- TRP-1 (TYRP1) [5], which is responsible for the conversion of 5,6-dihydroxyindole-2-carboxylic acid (DHICA) to 
indole-5,6-quinone-2-carboxylic acid. 

- TRP-2 (TYRP2) [6J. which is the melanogenic enzyme DOPAchrome tautomerase (EC 5.3.3.12) that catalyzes 
the conversion of DOPAchrome to DHICA. TRP-2 differs from tyrosinases and TRP-1 in that it binds two zinc ions 
instead of copper [7]. 

[1 707] Other proteins that belong to this family are: 

- Plants polyphenol oxidases (PPO) (EC 1.10.3.1) which catalyze the oxidation of mono- and o^iphenols to o- 
diquinones [8]. 

Caenorhabditis elegans hypothetical protein C02C2. 1 . 

[1708] Two signature patterns for tyrosinase and related proteins have been derived The first one contains two of 

the histidines that bind CuA. and is located in the N-terminal section of tyrosinase. The second pattem contains a 

histidine that binds CuB, that pattern is located in the central section of the enzyme. 

Consensus pattem H-x(4.5)-F.[LIVMFTP]-x-[FW] -H-R-x(2)-[LM]-x(3)-E [The two H's are copper ligands] 

[1709] Consensus pattemD-P-x.F-[LIVMFYW]-x(2)-H-x(3)-D [H is a copper ligand] 

[ IJLerch K. Prog. Clin. Biol. Res. 256:85-98(1988). 

[ 2]Jackman M.P, Hajnal A., Lerch K. Biochem. J. 274:707-713(1991). 

[ 3]Linzen B. Natunvissenschaften 76:206-211(1989). 

[ 4]Lang W.H., van Holde K.E. Proc. Natl. Acad. Sci. U.S.A. 88:244-248(1991). 

[ 5]Kobayashi T, Urabe K.. Winder A., Jimenez-Cervantes C. Inrrakawa G., Brewington T, Solano F Garcia- 
Borron J.C., Hearing V.J. EMBO J. 13:5818-5825(1994). 

[ 6]Jackson I.J., Chambers D.M., Tsukamoto K., Copeland N.G., Gilbert D.J., Jenkins N A . Hearina V. EMBO J 
11:527-535(1992). 

[ 7]Solano F., Martinez-Liarte J.H.. Jimenez-Cervantes C, Garcia-Borron J.C.. Lozano J.A. Biochem Biophys 
Res. Commun. 204:1243-1250(1994). 

[ 8]Cary J.W., Lax A.R., Flurkey W.H. Plant MoL Biol. 20:245-253(1992). 
[1710] 749. (Mur Ligase) Folylpolyglutamate synthase signatures 

Folylpolyglutamate synthase (EC 6.3.2.17) (FPGS) [1] is the enzyme of folate metabolism that catalyzes ATP-depend- 
ent addition of glutamate moieties to tetrahydrofolate. 
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[1711] Its sequence is moderately conserved between prokaiyotes (gene folC) and eukaryotes. We developed two 
signature patterns based on the conserved regions which are rich in glycine residues and could play a role in the 
catalytical activity and/or in substrate binding. 
Description of pattem(s) and/or profile(s) 

Consensus pattern[LIVMFY]-x-ILIVM]-(STAG]-G-T-[Niq-G-K-x-[ST]-x(7)-[LIVM](2)-x(^^^^ 
Consensus pattern[LIVMFY](2)-E-x-G-[LIVM]-[GA]-G-x(2)-D-x-(GSTl-x-[LIVM](2) 

[1712] [ 1]Shane B., GarrowT, Brenner A., Chen L. Choi YJ., Hsu J.C., Stover R Adv. Exp. Med Biol 338 629-634 
(1993). 

[1713] 750. (Peptidase M3) Neutral zinc metallopeptidases, zinc-binding region signature 
The majority of zinc-dependent metallopeptidases (with the notable exception of the carboxypeptidases) share a com- 
mon pattern of primary structure [1 ,2,3] in the part of their sequence involved in the binding of zinc, and can be grouped 
together as a superfamily known as the metzincins. on the basis of this sequence similarity. They can be classified into 
a number of distinct families [4, El] which are listed below along with the proteases which are currently known to belong 
to these families. 
[1714] Family M1 

- Bacterial aminopeptidase N (EC 3.4. 11 .2) (gene pepN). 
Mammalian aminopeptidase N (EC 3.4.11 .2). 

- Mammalian glutamyl aminopeptidase (EC 3.4.11.7) (aminopeptidase A). It may play a role in regulating growth 
and differentiation of early B-lineage cells. 

Yeast aminopeptidase yscll (gene APE2). 

Yeast alanine/arginine aminopeptidase (gene AAP1 ). 

Yeast hypothetical protein YIL1 37c. 

- Leukotriene A-4 hydrolase (EC 3.3.2.6). This enzyme is responsible for the hydrolysis of an epoxide moiety of 
LTA-4 to form LTB-4; it has been shown that it binds zinc and is capable of peptidase activity 

[171S] Family M2 

- Angiotensin-converting enzyme (EC 34.15.1) (dipeptidyl carboxypeptidase I) (ACE) the enzyme responsible for 
hydrolyzing angiotensin I to angiotensin II. There are two forms of ACE: a testis-specific isozyme and a somatic 
isozyme which has two active centers. 

[1716] Family M3 

- Thimet oligopeptidase (EC 3.4.24.15). a mammalian enzyme involved in the cytoplasmic degradation of small 
peptides. 

- Neurolysin (EC 3.4.24.16) (also known as mitochondrial oligopeptidase M or microsomal endopeptidase). 

- Mitochondrial intermediate peptidase precursor (EC 3.4.24.59) (MIP). It is involved the second stage of processing 
of some proteins imported in the mitochondrion. 

- Yeast saccharolysin (EC 3.4.24.37) (proteinase yscD). 

- Escherichia coli and related bacteria dipeptidyl carboxypeptidase (EC 3.4. 1 5.5) (gene dcp). 

- Escherichia coli and related bacteria oligopeptidase A (EC 3.4.24.70) (gene OfxJA or prIC). 

- Yeast hypothetical protein YKL134c. 

[1717] Family M4 

- Thermostable therrrolysins (EC 34.24.27), and related themriolabile neutral proteases (bacillolysins) (EC 
3.4.24.28) from varbus species of Bacillus. 

- Pseudolysin (EC 3.4.24.26) from PseudonrKjnas aeruginosa (gene lasB). 

- Extracellular elastase from Staphylococcus epidermidis. 
Extracellular protease prti from Erwinia carotovora. 
Extracellular minor protease smp from Serratia marcescens. 

- Vibriolysin (EC 3.4.24,25) from various species of Vibrio. 
Protease prtA from Listeria monocytogenes. 
Extracellular proteinase proA from Legionella pneumophila. 

[1718] Family M5 
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- Mycolysin (EC 3.4.24.31 ) from St reptomyces cacaoi. 
[1719] Family M6 

- Immune inhibitor A from Bacillus thuringiensis (gene ina). Ina degrades two classes of insect antibacterial proteins 
attacins and cecropins. 

[1720] Family M7 

Streptomyces extracellular small neutral proteases 
[1721] Family M8 

- Leishmanolysin (EC 3.4.24.36) (surface glycoprotein gp63). a cell surface protease from various species of Leish- 
mania 

[1722] Family M9 

- (Microbial collagenase (EC 3.4.24.3) from Clostridium perfringens and Vibrio alginolyticus. 
[1723] Family M10A 

- Serralysin (EC 3.4.24.40), an extracellular metaltoprotease from Serratia. 

- Alkaline metalloproteinase from Pseudomonas aeruginosa (gene aprA). 

- Secreted proteases A, B, C and G from Erwinia chrysanthemi. 

- Yeast hypothetical protein YIL1 08w. 

[1724] Family Ml OB 

- f^ammalian extracellular matrix metal lop roteinases (known as matrixins) [5]: MMP-1 (EC 3 4 24 7) (interstitial col- 
lagenase). MMP-2 (EC 3.4.24.24) (72 Kd gelatinase), MMP.9 (EC 34.24.35) (92 Kd gelatinase), MMP-7 (EC 
3.4.24.23) (matrylisin). MMP-8 (EC 3.4.24.34) (neutrophil collagenase). MMP-3 (EC 3.4.24.17) (stromelysin-1 ) 
MMP-IO (EC 3.4.24.22) (stromefysin-2). and MMP-1 1 (stromelysin-3), MMP-1 2 (EC 3.4.24.65) (macrophage met- 
al loelastase). ^ ^ 

- Sea urchin hatching enzyme (envelysin) (EC 3.4.24.1 2). A protease that allows the embryo to digest the protective 
envelope derived from the egg extracellular matrix. 

Soybean metal loendoproteinase 1. 

[1725] Family Mil 

Chlamydomonas reinhardtii gamete lytic enzyme (GLE). 
[1726] Family M12A 

- Astacin (EC 3.4.24.21 ). a crayfish endoprotease. 

- Meprin A (EC 3.4.24. 1 8). a mammalian kidney and intestinal brush border metalloendopeptidase 

- Bone morphogenic protein 1 (BMP-1 ), a protein which induces cartilage and bone formation and which expresses 
metalloendopeptidase activity. The Drosophila homolog of BMP-1 is the dorsal-ventral patterning protein tolloid 

- Blastula protease 10 (BP10) from Paracentrotus lividus and the related protein SpAN from Strongylocentrotus 
purpuratus, ^ 
Caenortiabditis elegans protein toh-2. 

Caenorhabditis elegans hypothetical protein F42A1 0.8. 

- Choriolysins L and H (EC 3.4.24.67) (also known as embryonic hatching proteins LCE and HCE) from the fish 
Oryzias lapides. These proteases participates in the breakdown of the egg envelope, which is derived from the 
egg extracellular matrix, at the time of hatching. 

[1727] Family M12B 
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- Snake venom metalloproteinases [6], This subfamily mostly groups proteases that act in hemorrhage Examples 
are: adamalysin II (EC 3.4.24.46). atrofysin C/D (EC 3.4.24.42). atrolysin E (EC 3,4.24.44). fibrolase (EC 3 4 24 72) 
trimerelysin I (EC 3.4.25.52) and II (EC 3.4.25.53). • • /. 

- Mouse cell surface antigen MS2. 

11728] Family M1 3 

- Mammalian neprilysin (EC 34.24.11) (neutral endopeptidase) (NEP). 

- Endothelin-converting enzyme 1 (EC 3.4.24.71 ) (ECE-1 ). which process the precursor of endothelin to release the 
active peptide. 

- Kell blood group glycoprotein, a major antigenic protein of erythrocytes. The Kell protein is very probably a zinc 
endopeptidase. ^ 

- Peptidase O from Lactococcus lactis (gene pepO). 

[1729] Family M27 

- Clostridial neurotoxins, including tetanus toxin (TeTx) and the various botulinum toxins (BoNT) These toxins are 
zinc proteases that block neurotransmitter release by proteolytic cleavage of synaptic proteins such as synapto- 
brevins, syntaxin and SNAP-25 [7,8]. 

[1730] Family M30 

- Staphylococcus hyicus neutral metalloprotease. 
[1731] Family M32 

- TTiermostable carboxypeptidase 1 (EC 34.17.19) (carboxypeptidase Taq). an enzyme from Thermus aquaticus 
which IS most active at high temperature. 

[1732] Family M34 

- Lethal factor (LF) from Bacillus anthracis, one of the three proteins composing the anthrax toxin. 
[1733] Family M35 

. Deuterolysin (EC 3.4.24.39) from Penicillium citrinum and related proteases from various species of Aspergillus. 
[1734] Family M36 

- Extracellular elastinolytic metalloproteinases from Aspergillus. 

[1 735] From the tertiary structure of themiolysin, the position of the residues acting as zinc ligands and those involved 
in the catalytic activity are known. Two of the zinc ligands are histidines which are very close together in the sequence- 
C-terminal to the first histidine is a glutamic acid residue which acts as a nucleophile and promotes the attack of a 
water molecule on the carbonyl carbon of the substrate. A signature pattem which includes the two histidine and the 
glutamic acid residues is sufficient to detect this superfamily of proteins. 
[1 736] Description of pattem(s) and/or profile(s) 

Consensus pattern[GSTALIVNJ-x(2)-H-E.[LIVMFYW].{DEHRKP}-H-x-[LIVMFYWGSPQ] [The 

two H's are zinc ligands] (E is the active site residue] 

Sequences known to betong to this class detected by the patternALL, 

except for members of families M5. M7 an\6 Mil. 

Other sequence(s) detected in SWISS-PROT55; including Neurospora 

crassa conidiation-specific protein 1 3 which could be a 

zinc-protease. 

[ IJJongeneel C.V., Bouvier J.. Bairoch A. 

FE8S Lett. 242:211-214(1989). 

[ 2]Murphy G.J.P, Murphy G.. Reynolds J.J. 
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FEBS Lett. 289:4-7(1991). 

( 3]Bod8 W.. Grams F.. Reinemer P., Gomis-Rueth F-X. Baumann U McKay 

D.B.,Sto8ckerW. 

Zoology 99:237-246(1996). 

[ 4]Rawlings N.D., Barrett A. J. 

Meth. Enzymol. 248:183-228(1995). 

[ 5]Woessner J. Jr. 

FASEB J. 5:2145-2154(1991). 

[ 6]Hite LA., Fox J. W., Bjamason J.B. 

[ 7]Montecucco C, Schiavo G. 

Trends Biochem. Sci. 18:324-327(1993). 

[ 8]Niemann H., Blasi J.. Jahn R. 

Trends Cell Biol. 4:179-185(1994). 

[1737] 751. PseudoU_synt_1 

tRNA pseudouridine synthase is involved in the formation of pseudouridine at the anticodon stem and loop of transf er- 
RNAs Pseudouridine is an isomer of uridine (5-(beta.D-ribofuranosyl) uracil, and id the most abundant modified nucl- 
eoside found in ail cellular RNAs. The TruA-like proteins also exhibit a conserved sequence with a strictly conserved 
aspartic acid, likely involved in catalysis. Number of members: 25 

[1738] [IJMedline: 98254513. Transfer RNA-pseudouridine synthetase Pusi of Saccaromyces cerevisiae contains 
one atom of zinc essential for Its native conformation and tRNA recognition. Arluison V Hountondii C Robert B Gros- 
jean H; Biochemistry 1998;37:7268-7276. 
[1739] 752. EPSP synthase signatures 

EPSP synthase (3-phosphoshikimate 1 -carboxyvinyltransf erase) (EC 2.5.1.19) catalyzes the sixth step in the biosyn- 
thesis from chorismate of the aromatic amino acids (the shikimate pathway) in bacteria (gene aroA), plants and fungi 
(where it is part of a multifunctional enzyme which catalyzes five consecutive steps in this pathway) [1 ] EPSP synthase 
has been extensively studied as it is the target of the potent herbicide glyphosate which inhibits the enzyme 
[1740] The sequence of EPSP from various biological sources shows that the structure of the enzyme has been well 
conserved throughout evolution. Two conserved regions were selected as signature pattems. The first pattem corre- 
spends to a region that is part of the active site and which is also important for the resistance to glyphosate [21 The 
second pattem Is located in the C-terminal part of the protein and contains a consented lysine which seems to be 
important for the activity of the enzyme. 
[1741] Description of pattem(s) and/or profile(s) 

[1742] Consensus pattem[LI VM]-x(2)-[GN]-N-[SA]-G.T-[STA]-x-R-x-[LIVMY].x-[GSTA] 
Consensus pattern[KRJ-x-[KH]-E.[CST]-[DNE]-R-[UVM]-x.[STA]-[LIVMC].x(2)-[EN]-[UVMF]-x-[KRA] 

[ 1]Stallings W.C.. Abdel-Megid S.S.. Lim LW.. Shieh H.-S., Dayringer H.E.. Leimgruber N.K., Stegeman R A 
Anderson K.S., Sikorski J.A., Padgette S.R., Kishore GM. Proc. Natl. Acad. Sci USA 88-5046-5050(1991 ) 
[ 2]Padgette S.R.. Re D.B., Gaser C.S., Elcholtz D.A.. Frazier R.B.. HIronaka CM., Levine E.B , Shah D M Fralev 
R.T, Kishore G.M. J. Biol. Chem. 266:22364-22369(1 991 ). 

[1 743] 753. Glyco_hydro_1 8 

Glycosyl hydrolases family 18. Number of members: 173 

[IJMedline: 95219379. Crystal structure of a bacterial chitinase at 2.3 A resolution. Perrakis A, Tews I, Dauter Z Op- 
penheim AB, Chet I, Wilson KS, Vorgias CE; Structure 1994;2:1169-1180 
[1744] 754. Esterase 
Putative esterase 

This family contains Esterase D Swiss:P10768. However it is not clear rf all members of the family have the same 

function. This family is possibly related to the COesterase family 

Number of members: 36 

[1745] 755. (HMA) Heavy-metal-associated domain 

A conserved domain of about 30 amino acid residues has been found [1] in a number of proteins that transport or 
detoxify heavy metals. This domain contains two conserved cysteines that could be involved in the binding of these 
metals. The domain has been termed Heavy-Metal-Associated (HMA). It has been found In: 

• A variety of cation transport ATPases (E 1 -E2 ATPases) (see <PDOC001 39>). The human copper ATPAses ATP7A 
and ATP7B which are respectively involved in Menke's and Wilson's diseases. ATP7A and ATP7B both contain 6 
tandem copies of the HMA domain. The copper ATPases CCC2 from budding yeast, copA from Enterococcus 
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faecahs and synA from Synechococcus contain one copy of the HMA domain. The cadmium ATPases cadA from 
Bacillus fimius and from plasmid pl258 from Staphylococcus aureus also contain a single HMA domain while a 
chromosomal Staphylococcus aureus cadA contains two copies. Other, less characterized ATPases that contain 
the HMA domain are: fixl from Rhizobium meliloti. pacS from Synechococcus strain PCC 7942), Mycobacterium 
leprae ctpA and ctpB and Escherichia coli hypothetical protein yhhO. In all these ATPases the HMA domaln(s) are 
located in the N-terminal sectbn. 

- Mercuric reductase (EC1 . 1 6. 1 , 1 ) (gene merA) which Is generally encoded by plasmids carried by mercury resistant 

r^®^*'^® reductase Is a class-1 pyridine nucleotide-disulphide oxidoreductase (see 

<PDOC00073>). There is general^ one HMA domain (with the exception of a chromosomal merA from Bacillus 
strain RC607 which has two) In the N-terminal part of merA. 
. Mercuric transport protein perlplasmic component (gene merP), also encoded by plasmids carried by mercury- 
resistant Gram-negative bacteria. It seems to be a mercury scavenger that specifically binds to one Hg(2+) ion 
and which passes it to the mercuric reductase via the merT protein. The N-terminal half of merP is a HMA domain 
Helicobacter pylori copper-binding protein copP 

- Yeast protein ATX1 [2], which could act in the transport and/or partitioning of copper. 

[1746] The consensus pattern for HMA spans the complete domain. 
[1 747] Description of pattem(s) and/or profile(s) 

Consensus pattem[LI VN]-x(2)-[LI VMFA].x-C-x-(STAGCDNH]-C-x(3).[LI VFG]-x(3)-tLI V].x(9, 1 1 )-[l VAl-x-f LVFYSl [The 
two C's probably bind metals] 

[ 1]Bull PC, Cox D.W. Trends Genet. 10:246-252(1994). 

[2]Lin S.-J.. Culotta VL. Proc. Natl. Acad. Sci. U.S.A. 92:3784-3788(1995). 

[1748] 756. (Peptidase M10) Matrixins cysteine switch 
PROSITE cross-reference(s): CYSTEINE_SWITCH 

Mammalian extracellular matrix metalloproteinases (EC 3.4.24-). also known as matrixins [1] (see <PDOC00129» 
are zinc-dependent enzymes. They are secreted by cells in an inactive form (zymogen) that differs from the mature 
enzyme by the presence of an N-temninal propeptide. A highly consen/ed octapeptide is found two residues downstream 
of the C-terminal end of the propeptide. This region has been shown to be Involved in autoinhibition of matrixins 12 31 
a cysteine wrthin the octapeptide chelates the active site zinc ion, thus inhibiting the enzyme. This region has been 
called the 'cysteine switch' or 'autoinhibitor region'. 
A cysteine switch has been found in the following zinc proteases: 

MMP-1 (EC 3.424.7) (interstitial collagenase). 
MMP-2 (EC 3.4.24.24) (72 Kd gelatinase). 
MMP-3 (EC 3.4.2417) (stromelysin-1). 
MMP-7 (EC 3.4.2423) (matrilysin). 
MMP-8 (EC 3.42434) (neutrophil collagenase). 
MMP-9 (EC 3.4.2435) (92 Kd gelatinase). 
MMP-1 0 (EC 3.4.24.22) (stromelysin-2). 
MMP-1 1 (EC 3.4.24.-) (stromelysin-3). 
MMP-1 2 (EC 3.4.2465) (macrophage metalloelastase). 
MMP-1 3 (EC 3.424-) (collagenase 3). 

MMP-1 4 (EC 3,424.-) (membrane-type matrix metalliproteinase 1). 
MMP-15 (EC 3.424.-) (membrane-type matrix metalliproteinase 2). 
MMP-16 (EC 3.4.24.-) (membrane-type matrix metalliproteinase 3). 
Sea urchin hatching enzyme (EC 3.4.2412) (envelysin) [4]. 
Chlamydomonas reinhardtii gamete lytic enzyme (GLE) [5]. 

[1 749] Description of pattem(s) and/or profile(s) 

Consensus pattemP-R-C-[GN]-x-P-[DR]-[LI VSAPKQ] [C chelates the zinc ionj 
[ IJWoessner J. Jr. FASEB J. 5:2145-2154(1991), 

[ 2]Sanchez-Lopez R., Nicholson R., Ge^el M.C.. Matrisian LM., Breathnach R. J. Biol. Chem. 263:11892-11899 
(1 988). 

[ 3]Park A.J., Matrisian L.M.. Kells A.F.. Pearson R., Yuan Z.. Navre M. J. Biol. Chem 266-1584-1590(1991) 
1 4]Lepage T. Gache C. EMBO J. 9:3003-3012(1 990). 
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[ SJKinoshftaT. Fukuzawa H., ShimadaT, SaitoT. Matsuda Y. Proc. Natl. Acad ScL U.S.A. 89:4693-4697(1992). 
[1750] 757. (Peptidase S8) Serine proteases, subtilase family, active sites 

PROSITE cross-reference(s): PS00136; SUBTILASE^ASP, PS00137; SUBTILASE.HIS. PS00138: SUBTILASE_SER 
Subtilases [1 ,2] are an extensive family of serine proteases whose catalytic activity is provided by a charge relay system 
similar to that of the trypsin family of serine proteases but which evolved by independent convergent evolution The 
sequence around the residues Involved in the catalytic triad (aspartic acid, serine and histidine) are completely different 
from that of the analogous residues in the trypsin serine proteases and can be used as signatures specific to that 
category of proteases. 

The subtilase family currently includes the following proteases: 

- Subtilisins (EC 3.4.21 .62), these alkaline proteases from various Bacillus species have been the target of numerous 
studies in the past thirty years. 
Alkaline elastase YaB from Bacillus sp. (gene ale). 
Alkaline serine exoprotease A from Vibrio alginolyticus (gene proA). 
Aqualysin I from Thermus aquaticus (gene psti). 
AspA from Aeromonas salmonicida. 

Bacillopeptidase F (esterase) from Bacillus subtilis (gene bpf). 
C5A peptidase from Streptococcus pyogenes (gene scpA). 
Cell envelope-located proteases PI, Pll, and PIN from Lactococcus lactis. 
Extracellular serine protease from Serratia marcescens. 
Extracellular protease from Xanthomonas campestris. 
Intracellular serine protease (ISP) from various Bacillus. 
Minor extracellular serine protease epr from Bacillus subtilis (gene epr). 
Minor extracellular serine protease vpr from Bacillus subtilis (gene vpr). 
Nisin leader peptide processing protease nisP from Lactococcus lactis. 
Serotype-specific antigene 1 from Pasteurella haemolytica (gene ssal). 
Thermitase (EC 3.4.21.66) from Thermoactinomyces vulgaris. 
Calcium-dependent protease from Anabaena variabilis (gene prcA). 
Halolysin from halophilic bacteria sp. 172p1 (gene hly). 
Alkaline extracellular protease (AEP) from Yarrowia lipolytica (gene xpr2). 
Alkaline proteinase from Cephalosporium acremonium (gene alp). 
Cerevisin (EC 3.4.21.48) (vacuolar protease B) from yeast (gene PRB1). 
Cuticle-degrading protease (prl) from Metarhizium anisopliae. 
KEX-1 protease from Kluyveromyces lactis. 
Kexin (EC 3.4.21.61) from yeast (gene KEX-2). 
Oryzin (EC 3.4.21.63) (alkaline proteinase) from Aspergillus (gene alp). 
Proteinase K (EC 3.4.21.64) from Tritirachium album (gene proK). 
Proteinase R from Tritirachium album (gene proR). 
Proteinase T from Tritirachium album (gene proT). 
Subtilisin-like protease 111 from yeast (gene YSP3). 
Thermomycolin (EC 3.4.21.65) from Malbranchea sulfurea. 

Furin (EC 3.4.21 .85). neuroendocrine convertases 1 to 3 (NEC-1 to -3) and PACE4 protease from mammals, other 
vertebrates, and invertebrates. These proteases are involved in the processing of hormone precursors at sites 
comprised of pairs of basic amino acid residues [3], 

- Tripeptidyl-peptldase 11 (EC 3.4.14.10) (tripeptidyl aminopeptidase) from Human. 

- Prestalk-specific proteins tagB and tagC from slime mold [4]. Both proteins consist of two domains- a N-terminal 
subtilase catalytic domain and a C-terminal ABC transporter domain (see <PDOC00185>). 

[1751] Description of pattem(s) and/or profile(s) 

Consensus pattern[STAlV].x-[LIVMF]-[LIVM]-D-[DSTA]-G.[LlVMFC]-x(2.3).[DNH] [D is the active site residue] 
Consensus pattemH-G-[STM].x-[VlC].{STAGC]-[GSJ-x-[LlVMA]-[STAGCLV).[SAGM] (H is the active site residue! 
Consensus pattemG-T.S-x-[SA]-x-P-x(2)-[STAVC]-(AG] [S is the active site residue] 

Note if a protein includes at least two of the three active site signatures, the probability of it being a serine protease 
from the subtilase family is 100% 

Note these proteins belong to family S8 in the classification of peptidases [5, El]. 

( 1]Siezen R.J.. de Vos W.M., Leunissen J.A.M.. Dijkstra B.W. Protein Eng. 4:719-737(1991). 
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[ 2]Si82en R.J. (In) Prcx:e8ding subtilisin symposium. Hamburg, (1992). 
[ 3]8arr RJ. Cell 66:1-3(1991), 

[ 4]Shaulsky G.. Kuspa A.. Loomis W.R; Genes Dev. 9:1111-1122(1995). 
[ 5]Rawlings N.D., Barrett A. J. Meth. Enzymol. 244:19-61(1994). 

[17521 758. (SSB) Single-strand binding protein family signatures 
PROSITE cross-reference(s): PS00735; SSB_1,PS00736; SSB_2 

The Escherichia coll single-strand binding protein [1] (gene ssb), also known as the helix^estabiliztng protein, is a 
protein of 1 77 amino acids. It binds tightly, as a homotetramer. to single-stranded DN A (ss-DNA) and plays an important 
role in DNA replication, recombination and repair. 

[1753] Closely related variants of SSB are encoded in the genome of a variety of large self4ransmissible plasmids. 
SSB has also been characterized in bacteria such as Proteus mirabilis or Serratia marcescens. 
[1 754] Eukaryotic mitochondrial proteins that bind ss-DNAand are probably involved in mitochondrial DNA replication 
are structurally and evolutionary related to prokaryotic SSB, Proteins currently known to belong to this subfamily are 
listed bebw [2]. 

- Mammalian protein Mt-SSB (PI 6). 

- Xenopus Mt-SSBs and Mt-SSBr. 
Drosophila MtSSB. 

Yeast protein RIM1. 

[1755] Two signature patterns have been developed for these proteins. The first is a consen/ed region in the N- 
terminal section of the SSB's. The second is a centrally located region which, in Escherichia coli SSB, is known to be 
involved in the binding of DNA. 
[1 756] Description of pattern(s) and/or profile(s) 

Consensus pattern[LIVMF]-[NST]-[KRT].[LIVM]-x-[LIVMF](2)-G-[NHRK]-[LIVM]-[GST].x-[DET] 
Consensus patternT-x-W-[HY]-[RNS]-[LIVM]-x-ILIVMF]-[FY]-(NGKR] 

[ IJMeyer R.R.. Laine PS. Microbiol. Rev. 54:342-380(1990). 
[ 2]Stroumbakis N.D.. Li Z, Tolias PP Gene 143:171-177(1994). 

[1757] 759. KDPG and KHG aldolases active site signatures 

PROSITE cross-reference(s): PS00159; ALDOLASE_KDPG_KHG_1 , PS00160; ALDOLASE_KDPG_KHG_2 
[1 758] 4-hydroxy-2-oxoglutarate aldolase (EC 4. 1 . 3. 1 6) (KHG^aldolase) catalyzes the interconversion of 4-hydroxy- 
2-oxoglutarate into pyruvate and glyoxylate. Phospho-2-dehydro-3-deoxygluconate aldolase (EC 4.1,2.14) (KDPG- 
aldolase) catalyzes the interconversion of 6-phospho-2-dehydro-3Kieoxy-D-gluconate into pyruvate and glyceralde- 
hyde 3-phosphate. 

[1759] These two enzymes are structurally and functionally related [1]. They are both homotrimeric proteins of ap- 
proximately 220 amino-acid residues. They are class I aldolases whose catalytic mechanism involves the formation 
of a Schiff-base intermediate between the substrate and the epsilon-amino group of a lysine residue. In both enzymes, 
an arginine is required for catalytic activity 

[1760] Two signature pattems were developed for these enzymes. The first one contains the active site arginine and 
the second, the lysine involved in the Schiff-base formation. 
[1761] Description of pattern(s) and/or profile(s) 

Consensus pattemG-[LIVMl-x(3)-E-[LIV]-T-[LF]-R [R is the active site residue] Consensus pattemG-x(3)-[LI VMFl-K- 

(LF]-F-P-[SA]-x(3)-G [K is involved in Schiff-base formation] 

[1762] [ 1] Vlahos C J., Dekker E.E. J. Bral. Chem. 263:11683-11691(1988). 

[1763] 760. AP endonucleases family 1 signatures, PROSITE cross-reference(s): PS00726- 

AP,NUCLEASE_F1_1 . PS00727; AP_NUCLEASE_F1_2. PS00728 

AP_NUCLEASE_F1_3 

[1764] DNA damaging agents such as the antitumor drugs bleomycin and neocarzinostatin or those that generate 
oxygen radicals produce a variety of lesions in DNA. Amongst these is base-loss which forms apurinic/apyrimidinic 
(AP) sites or strand breaks with atypical 3»temiini. DNA repair at the AP sites is initiated by specific endonuclease 
cleavage of the phosphodiester backbone. Such enctonucleases are also generally capable of removing blocking 
groups from the 31erminus of DNA strand breaks. 

[1765] AP endonucleases can be classified into two families on the basis of sequence similarity Family 1 groups 
the enzymes listed below [1]. 
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- Escherichia coll exonuclease III (EC 3.1.11.2) (gene xlhA). 

- Streptococcus pneumoniae and Bacillus subtilis exonuclease A (gene exoA). 

- Mammalian AP endonuclease 1 (API) (EC 4.2.99.18). 
Drosophlla recombination repair protein 1 (gene Rrpi). 

- Arabtdopsis thaliana apurinic endonuclease-redox protein (gene arp), 

[17S6] Except for Rrpi and arp, these enzymes are proteins of about 300 amino^cid residues. Rrpi and arp both 
contain additional and unrelated sequences in their N-terminal section (about 400 residues for Rrpi and 270 for arp) 
[17671 Three signature patterns were developed for this family of enzymes. The patterns are based on the most 
conserved regions. The first pattem contains a glutamate which has been shown [2], in the Escherichia coli enzyme 
to bind a divalent metal ion such as magnesium or manganese 

[1768] Consensus pattem[APF]-D-[LIVMF]{2)-x.(LIVM].Q-E-x-K [E binds a divalent metal ion] 
Consensus pattemD-{ST]-[FY]-R-(KH]-x(7.8)-[FYW]-[ST]-[FYW](2) 
Consensus patternN-x-G-x-R-[LIVM]-D-[LIVMFYH]-x-[LV]-x-S 

[ IJBarzilay G.. Hickson I.S. BioEssays 17:713-719(1995). 

[ 2] Mol CD,. Kuo C.-F.. Thayer M.M., Cunningham R.P, Tainer J. A, Nature 374:381-386(1995). 

[1769] 761. (ER)Enhancer of rudimentary signature, PROSITE cross-reference(s): PS01290 ER 

[1 770] The Drosophlla protein 'enhancer of rudimentary' (gene (e(r)) is a small protein of 1 04 residues whose function 

IS not yet clear. From an evolutionary point of view, rt is highly conserved [1] and has been found to exist in probably 

all multicellular eukaryotic organisms. It has been proposed that this protein plays a role in the cell cycle 

[1771] A conserved region In the central part of the protein was selected as as signaure pattem 

[1772] Consensus pattem Y-D-l-[SA]-x-L-[FY]-x-F-[IV]-D-x(3)-D-[LIV]-S 

[1773] [ 1J Gelsthorpe M., Pulumati M.. McCallum C, Dang-Vu K., Tsubota S.I. Gene 186-189-195(1997) 

oIJ^L/cxc^^J ^"^"^^^^ flavoprotein alpha-subunit signature. PROSITE cross-reference (s): 

PbOOoSo; ETF_ALPHA 

[177q The electron transfer flavoprotein (ETF) [1 ,2) serves as a specific electron acceptor for various mitochondrial 
dehydrogenases. ETF transfers electrons to the main respiratory chain via ETF-ubiquinone oxidoreductase ETF is an 
heterodimer that consist of an alpha and a beta subunit and which bind one molecule of FAD per dimer A similar 
system also exists in some bacteria. 

[1776] The alpha subunit of ETF is a protein of about 32 Kd which is structurally related to the bacterial nitroaen 
fixation protein fixB which could play a role in a redox process and feed electrons to ferredoxin 
[1777] Other related proteins are: 

Escherichia coli hypothetical protein ydiR. 
Escherichia coli hypothetical protein ygcQ. 

[1778] A highly conserved region which is located in the C-terminal section was selected as a signature pattem for 
these proteins. 

[1779] Consensus pattem [LI]-Y-[LIVM]-[AT|-x-G-[IV]-[SD]-G-x-[IV]<3-H-x(2)-G-x(6)-[IV]-x-A-[IV]-N 

[ 1] Finocchiaro G., Ikeda Y, Ito M., Tanaka K. Prog. Clin. Biol. Res. 321:637-652(1990) 
[ 2] Tsai M.H., Saier IVI.H. Jr Res. Microbiol. 146:397-404(1995). 

[1780] 763. (lectin c) C-type lectin domain signature and profile 

PKOSITE cross-reference(s): PS00615; C_TYPE_LECTIN_1, PS50041; C_TYPE_LECTIN_2 
[1781] A number of different families of proteins share a conserved domain which was first characterized in some 
animal lectins and which seem to functbn as a calcium-dependent carbohydrate-recognition domain [1.2 3] TTiis do- 
rnain. w^ch ,s known as the C-type lectin domain (CTL) or as the carbohydrate-recognition domain (CRD), consists 
of about 110 to 130 residues. There are four cysteines which are perfectly consented and involved in two disulfide 
bonds. A schematic representation of the CTL domain is shown below. 
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I I 



xcxxxxcxxxxxxxCxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxCxxxxWxCx 
xxxCx 

+.-+ + ^ 

'C: conserved cysteine involved in a disulfide bond, 
•c': optional cysteine involved in a disulfide bond, 
position of the pattern. 



[1782] The categories of proteins, in wliich the CTL ckDmain has been found, are listed below. 

[1783] Type-ll membrane proteins where the CTL domain is located at the C-terminal extremity of the proteins: 

- Asialoglycoprotein receptors (ASGPR) (also known as hepatic lectins) [4]. The ASGPR's mediate the endocytosis 
of plasma glycoproteins to which the terminal sialic acid residue in their carbohydrate moieties has been removed 

- Low affinity immunoglobulin epsilon Fc receptor (lymphocyte IgE receptor), which plays an essential role in the 
regulation of IgE production and in the differentiation of B cells. 

■ Kupffer cell receptor A receptor with an affinity for galactose and fucose, that could be involved in endocytosis 

■ A number of proteins expressed on the surface of natural killer T-cells: NKG2, NKR-P1 YE1/88 (Ly-49) CD69 
and on B^eWs: CD72. LyB-2. The CTL- domain in these proteins is distantly related to other CTL-domains- it is 
unclear whether they are likely to bind carbohydrates. 

[1784] Proteins that consist of an N-tenminal collagenous domain followed by a CTL- domain [51. these proteins are 
sometimes called 'collectins'; 

- Pulmonary surfaciant-associated protein A(SP-A). SP-A is a calcium-dependent protein that binds to surfactant 
phospholipids and contributes to lower the surface tension at the air-liquid interface in the alveoli of the mammalian 
lung. 

Pulmonary surfactant-associated protein D (SP-D). 

- Conglutinin. a calcium-dependent lectin-like protein which binds to a yeast cell wall extract and to immune com- 
plexes through the complement component (iC3b). 

- Mannan-binding proteins (MBP) (also known as mannose-binding proteins). 
MBP's bind mannose and N-acetyl-D-glucosamine in a calcium-dependent 
manner 

- Bovine collectin-43 (CL-43). 

[1 785] Selectins (or LEC-CAM) [6,7]. Selectins are cell adhesion molecules implicated in the interaction of leukocytes 
with platelets or vascular endothelium. Structurally, selectins consist of a long extracellular domain, foltowed by a 
ransmembrane region and a short cytoplasmic domain. The extracellular domain Is itself composed of a CTL-domain 
foltowed by an EGF-like domain and a variable number of SCR/Sushi repeats. Known selectins are: 

- Lymph node homing receptor (also known as L-selectin, leukocyte adhesiwi 
molecule-1, (LAM-1). leu-8. gp90-mel. or LECAM-1) 
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- Endothelial leukocyte adhesion nralecule 1 (ELAM-1 . E-selectin or LECAM-2). 
[1786] The ligand recognized by ELAM-1 is slalyl-Lewis x. 

- Granule membrane protein 1 40 (GMP-140, P-selectin. PADGEM. CD62. or LECAM- 
3). The ligand recognized by GMP-140 is Lewis x. 

!iSl secSon- ' CTL^in folbwed by one copy of a SCR/ Sushi repeat, in their C- 

matrix of cartilagenous tissues where it has a role in the resistance to compression 

- Brevican. 
Neurocan. 

" s7gSi^rJ'''''''''°'''''''°*''^'^^^^^ 

[UeSi In addition to the CTL and Sushi domains, these proteins also contain, in their N-terminal domain an Iq-like 
V-type region, two or four link domains (see <PDOC00955» and up to two EGF-like repeats ^ 
[1789] Two type-l membrane proteins: 

- Mannose receptor from macrophages. This protein mediates the endocytosis of 

glycoproteins by macrophages in several recognition and uptake processes. Its extracellular section consists of a 
fibronectin type II domain followed by eight tandem repeats of the CTL domain 

- 1 80 Kd secretory phospholipase A2 receptor (PLA2-R). A protein whose 
structure is highly similar to that of the mannose receptor 

" '!!.?^ ' '^^H- "^^ '^^"^ ^"'^ •'^V'"''^ «P"helial cells to capture and endocytose 

d^erse carbohydrate-binding antigens and direct them to antigen-processing cellular compartlments D^c SI 

■ dirrs'Scr^^^^^^^ 

domain. 2 VWFC domains (see <PDOC00928), and a CTCK (see <PDOC00912>). 
[1790] Various other proteins that uniquely consist of a CTL domain: 

" '";j,;^^f f ^'^'^^-binding lectins. A category to which belong a humoral lectin from a flesh fly echi- 

T ""'^ °* « "^'^hin; BRA-2 and BRA-3. two lectins from the coelomic Z^i a 

ooZce o^ h^r^ IT tunicate PoVandrocarpa misakiensis and a newt oviduct lectin. The physiological im 

P«^^l, . ™y P'^y ^" ''"P^rt^"* defense mechanisms 

- Pancreatic stone protein (PSP) (also known as pancreatic thread protein (PTP) or rea) a orotein that minht ..t 
as an inhibitor of spontaneous calcium carbonate precipitation ^ ^ ' ^""^ 

- TT'^'"^ protein (PAP), a protein that might be invoVed in the control of bacterial proliferation 
Tetranectin, a plasma protein that binds to plasminogen and to isolated kringle 4 

- Eosinophil granule major basic protein (MBP), a cytotoxic protein. 

- A galactose specific lectin from a rattlesnake. 

' 2Sh K "h °f 9"'5«i°" 'X^actor X-binding protein (IX/X-bp). a snake venom anticoagulant protein 
which binds with factors IX and X in the presence of calcium. ^"dguwm proiein 

- Two subunits of a phospholipase A2 inhibitor from the plasma of a snake (PLI-A and PLI-B) 

- Alipopolysaccharide-bindingprotein(LPS-BP)fromthehemolymphofa 
cockroach [8J. 

- Sea raven antifreeze protein (AFP) [9]. 

^ ^ P^"®"" '^'^ C-temiinal region with its three conserved cysteines was selected 

Sr;?kr=LS 

Note this documentation entiy is linked to both a signature pattern and a profile. As the profile is much more sensitive 
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than the pattern, you should use it if you have access to the necessary software tools to do so. 

[ 1] Drickamer K. J. Biol. Chem. 263:9557-9560(1988). 
[ 2] Drickamer K. Prog. Nucleic Acid Res. Mol. Biol. 45:207-232(1993). 
5 [ 3] Drickanner K. Curr. Opin. Stmct. Biol. 3:393-400(1993). 

[ 4] Spiess M. Biochemistry 29:10009-10018(1990). 

[ 5] Weis W.I.. Kahn R. Fourme R., Drickamer K., Hendrickson W.A. Science 254:1608-1615(1991) 
[ 6] Siegelman M. Curr Biol. 1:125-128(1991). 
[ 7] Lasky LA. Science 238:964-969(1992). 
^0 [ 8] Jomori T. Natori S. J. BioL Chem. 266:13318-13323(1991). 

[9] Ng N.RL, HewC.-L J. Biol. Chem. 267:16069-16075(1992). 

[1793] 764. (SRCR) Speract receptor repeated domain signature 

PROSITE cross-reference(s): PS00420; SPERACT_RECEPTOR, 
IS [1794] The receptor for the sea urchin egg peptide speract is a transmembrane glycoprotein of 500 amino acid 

residues [1]. Structurally it consists of a large extracellular domain of 450 residues, followed by a transmembrane 

region and a small cytoplasmic domain of 12 amino acids. The extracellular domain contains four repeats of a 115 

ammo acids domain. There are 17 positions that are perfectly consented in the four repeats, among them are six 

cysteines, six glycines, and three glutamates. 
20 [1795] Such a domain is also found, once, in the C-terminal section of mammalian macrophage scavenger receptor 

type I [2], a membrane glycoproteins implicated in the pathologic deposition of cholesterol in arterial walls during athero- 

genesis. 

[1796] The signature pattern that was derived spans part of the N-terminal section of the domain and contains 8 of 
the 17 consen/ed residues. 

[1797] Consensus pattemG-x(5)-G-x(2)-E-x(6)-W-G-x(2)-C-x(3)-[FYW]-x(8)-C-x(3)-G 

[ 1] Dangott J.J.. Jordan J.E., Belief R,A., Garbers D.L Proc. Natl. Acad. Sci. U.S.A. 86:2128-2132(1989) 
[2] Freeman M., Ashkenas J., Rees D.J., Kingsley D.M., Copeland N.G.. Jenkins N.A.. Krieger M Proc Natl 
Acad. Sci. U.S.A. 87:8810-8814(1990). 

[1798] 765. Bac_surface_Ag 
Bacterial surface antigen 

This entry includes the following surface antigens; D15 antigen from H.influenzae, OMA87 from Rmultocida, OMP85 
from N.meningitidis and N. gonorrhoeae. Number of members: 14 
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[1]Medline: 95255676. The sequencing of the 80-kDa D15 protective surface antigen of Haemophilus influenzae 
Flack FS, Loosmore S, Chong R Thomas WR; Gene 1995;156:97-99. 

[2] Medline: 96333354. Cloning, sequencing, expression, and protective capacity of the oma87 gene encoding the 
Pasteurella multocida 87-kilodalton outer membrane antigen. Ruffolo CG. Adler B- Infect Immun 1996'64- 
40 3161-3167. ' ' 



[1799] 766. BRCA1 C Terminus (BRCT) domain 

The BRCT domain is found predominantly in proteins involved in cell cycle checkpoint functions responsive to DNA 
damage. It has been suggested that the Retinoblastoma protein contains a divergent BRCT domain, this has not been 
^ included in this family. The BRCT domain of XRCC1 forms a homodimer in the crystal structure Medline-99016060 
This suggests that pairs of BRCT domains 
associate as homo- or heterodimers. Number of members: 1 31 

[1] Medline: 96259550. BRCA1 protein products ...Functional motifs... Koonin EV, Altschul SF Bork P- Nature 
so Genet 1996;13:266-268. 

[2] Medline: 97153217. From BRCA1 to RAP1 : A wkJespread BRCT module closely associated with DNA repair 
Callebaut I, Mornon JP; Febs lett 1 997;400: 25-30. 

[3] Medline: 97186552. A superfamily of conserved domains in DNA damage responsive cell cycle checkpoint 
proteins Bork R Hofmann K, Bucher R Neuwald AR Altschul SR Koonin EV; Faseb J 1997*1 1*68-76 
[4] Medline: 97402527. Gapped BLAST and PSI-BLAST a new generation of protein database search programs 
Altschul SR Madden TL. Schaffer AA, Zhang J, Zhang Z, Miller W. Lipman DJ; Nucleic Acids Res 1997*25- 
3389-3402. 

[5] Medline: 99016060. Structure of an XRCC1 BRCT domain: a new protein-protein interaction rrKxJuIe. Zhang 
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X. Morera S, Bates PA. Whitehead PC. Coffer Al, Hainbucher K. Nash RA. Sternberg m. Lindahl T, Freemait PS; 
[1800] 767. Kappa casein 

Kappa-casein is a mamnnalian milk protein involved in a number of important physblogical processes. In the gut. the 
ingested protein is split into an insoluble peptide (para kappa-casein) and a soluble hydrophllk; glycopeptide (casei- 
nomacropeptide). 

Caseinomacropeptide is responsible for increased efnciency of digestion, prevention of neonate hypersensitivity to 
ingested proteins, and inhibitbn of gastric pathogens. Number of members: 56 

[laoi] [1] Medline: 98072500. Nucleotide sequence evolution at the kappa-casein locus: evidence for positive se- 
lection within the family Bovidae. Ward TJ. Honeycutt RL, Derr JN; Genetics 1997; 147; 1863-1 872. 
[1802] 768. Chitinases family 18 active site 
PROSITE cross-reference(s> CHITINASE_18 

Chitinases (EC 3.2.1 . 14) [1 J are enzymes that catalyze the hydrolysis of the beta-1 ,4-N-acetyl-D-glucosamine linkages 
in chitin polymers. From the view point of sequence similarity chitinases belong to either family 18 or 19 in the classi- 
fication of glycosyl hydrolases [2, E 1 ]. Chitinases of family 1 8 (also known as classes 1 1 1 or V) groups a variety of proteins: 

a) Chitinases from: 

- Prokaryotes such as Alteromonas. Bacillus, Serratia, Streptomyces, etc. 
Plants such as Arabidopsis, cucumber, bean, tobacco, etc. 
Fungi such as Aphanocladium, Rhizopus, Saccharomyces, etc. 
Nematode (Brugia malayi). 
Insects (Manduca sexta). 

- Baculoviruses (Autographa Californica Nuclear Polyhedrosis virus). 

b) Other proteins: 

Hevamine, a rubber tree protein with chitinase and lysozyme activities. 
Kluyveromyces lactis killer toxin alpha subunit, which acts as a chitinase. 
Flavobacterium and Streptomyces endo-beta-N-acetylglucosaminidases (EC 3.2. 1 .96). 

- Mammalian di-N-acetylchitobiase which is involved in the degradation of asparagine-linked glycoproteins. 

- Human cartilage glycoprotein Gp-39. 

- Jack bean concanavalin B (conB), a protein that has lost its catalytic activity. 

[1803] Site directed mutagenesis experiments [3] and crystallographic data [4,5] have shown that a conserved gluta- 
mate is involved in the catalytic mechanism and probably acts as a proton donor. This glutamate is at the extremity of 
the best conserved region in these proteins. 

[1804] Consensus pattem[LI VMFYHDN]-G-[LIVMF]-[DN]-[LIVMFHDN]-x-E [E is the active site residue] 

40 [ 1] Flach J., Pilet P-E., Jolles P Experientia 48:701-716(1992). 

[ 2] Henrissat B. Biochem. J. 280:309-316(1991). 

[ 3] Watanabe T, Kohori K.. Miyashita K., Fujii T. Sakai H.. Uchida M., Tanaka H. J. Biol. Chem. 268-18567-18572 

(1993) . 

[4] Perrakis A.. Tews I., Dauter 2., Oppenheim A.B.. Chet I., Wilson K.S., Vorgias C.E. Structure 21169-1180 

(1994) . 

[5] van Scheltinga A.C.T, Kalk K.H., Beintema J.J., Dijkstra B.W. Structure 2:1181-1189(1994). 
[1805] 769. gag_p17, gag gene protein p17 (matrix protein). 

The nrratrix protein forms an icosahedral shell associated with the inner membrane of the mature immunodeficiency 
^0 virus. Number of members: 1 598 

[1 806] [1 ] Medline: 95055757. Three-dimensional structure of the human immunodeficiency virus type 1 matrix pro- 
tein. Massiah MA, Starich MR. Paschall C, Summers MR Christensen AM, Sundquist Wl; J Mol Biol 1 994;244: 1 98-223. 
[1807] 770. GDA1/CD39 family of nucleoside phosphatases signature 
PROSITE cross-reference(s); GDA1_CD39_NTPASE 

A number of nucleoside diphosphate and triphosphate hydrolases as well as some yet uncharacterized proteins have 
been found to belong to the same family [1 , 2]. This family currently consist of: 

- Yeast guanosine-diphosphatase (EC 3.6.1.42) (GDPase) (gene GDAI). GDA1 is a golgi integral membrane en- 
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zyme that catalyzes the hydrolysis of GDP to GMR 

- Potatoapyrase (EC 3.6. 1 .5) (adenosine diphosphatase) (ADPase). Apyrase acts on both ATP and ADP to produce 

- MammalianvascularATP-diphosphohyd 

' J^^hT^ gondn nucleoside-trtphosphatases (EC 3.6.1.15) (NTPase). NTPase hydrolyses various nucleoside 
nphosphates to produce the corresponding nucleoside mono and diphosphates. This enzyme is secreted into 
resided parasitophorous vacuole, a specialized compartment where the parasite intracellulary 

- Pea nucleoside-triphosphatases (EC 3,6.1.15) (NTPase). 

Caenorhabditis elegans hypothetical protein C33H5. 1 4. 
Caenorhabditis elegans hypothetical protein R07E4.4. 

- Yeast chromosome V hypothetical protein YER005w. 

[18081 The above uncharacterized proteins all seem to be membrane-bound 

SoH '^^'^ ^ """"^^^ conserved domains. The best conserved of these domains have been 

selected. It is located in the central section of the proteins. 

[1810] Consensus pattem[LIVM]-x-G-x(2)-E-G-x-[FY]-x-[FW]-[LIVAHTAG}-x-N-[HY] 
[ 1] Handa M.. Guidotti G. Biochem. Biophys. Res. Commun. 218'91 6-923(1 996) 

[ 2] yasconcetos E.G.. Ferreira S.T. de Carvalho TM.U.. de Souza W.. Kettlun A.M.. Mancilla M.. V^lenzuela M 
A., Verjovski-Almeida S. J. Biol. Chem. 271:22139-22145(1996). 

[1811] 771. GTP cyclohydrolase I signatures 

35?ifif.«tT"''?rK'^'\?^''-^^^'-°"^°"°'--'-'' GTP_CYCLOHYDROL_1_2 GTP cyclohydrolase I (EC 
w ^ l^K®^ the biosynthesis of formic acid and dihydroneopterin triphosphate from GTP This reaction is the 
first step .n the biosynthesis of tetrahydrofolate in prokaryotes. of tetrahydrobbpterin in vertebrates, and of pter dine 
containing pigments in insects. ^ 
nl?h? cyclohydrolase I is a protein of from 1 90 to 250 amino acid residues. The comparison of the sequence 
of the enzyme from bacter«l and eukao^otic sources shows that the structure of this enzyme has been extremely well 
consented throughout evolution [1]. oAuoinBiy wen 

t?^ ^1 Twoconserved regions were selected as signature patterns. The first contains a perfectly conserved tetraoeD- 
tide Which IS part of the GTP^inding pocket [2], the second region also contains conserved residues invSn GTP- 

[1814] Consensus pattem[DEN]-[LIVM](2)-x(2)-[KRNQHDENI-[LIVMJ-x(3)-[ST]-x-C-E- H-H 
Consensus pattern[SA]-x-[RK]-x-Q-[LIVM]-aE-[RN]-[LIHTSN] 

7Jl7?f(1995T"'' ^ " ^ " ^°P^y^- Commun. 212: 

[ 2] Nar H., Ruber R.. Meining W., Schmid C, Weinkauf S.. Bacher A. Structure 3:459-466(1995). 
[1815] 772. llvC, Acetohydroxy ackJ isomeroreductase 

Acetohydro)^ acid isomeroreductase catalyses the conversion of acetohydroxy acids into dihydroxy valerates This 
NumbTr of mem^^^^ '^'""'^^ '^"""'"^ acids valine and isoleucine. 

Km The crystal structure of plant acetohydroxy acid isomeroreductase complexed with 

R r H ^^.T^P^T"^ « ^""^ ^ ^^^^^ ^"^'^9 determined at 1 .65 A resolution. Biou V Dumas 

Cohen-Addad C, Douce R, Job D, Pebay-Peyroula E; EMBO J 1997;16:3405-3415. 
[1817] 773. Prokaryotic membrane lipoprotein lipid attachment site 
PROSITE cross-reference(s); PROKAR_LIPOPROTEIN 

li"orn?S^^^^^^ lipoproteins are synthesized with a precursor signal peptide, which is cleaved by a specific 

hpoprotein signal peptidase (signal peptidase II). The peptidase recognizes a conserved sequence and cuts upstream 
or^S^^ r^''^.^ g^ceride-fatty acid lipid is attached [1]. Some of the proteins known to undergo such 

processing currently include (for recent listings see [1 ,2,3]): 

- Major outer membrane lipoprotein (mureinHipoproteins) (gene Ipp). 
Escherichia coli lipoprotein-28 (gene nlpA). 

- Escherichia coli lipoprotein-34 (gene nIpB). 
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Escherichia coli lipoprotein nIpC. 
Escherichia coli lipoprotein nIpD. 

- Escherichia coli osmotically inducible lipoprotein B (gene osmB). 

- Escherichia coli osmotically inducible lipoprotein E (gene osmE). 

- Escherichia coli peptidoglycan-associated lipoprotein (gene pal), 

- Escherichia coli rare lipoproteins A and B (genes rplA and rplB). 
Escherichia coli copper homeostasis protein cutF (or nIpE). 
Escherichia coli plasmids traT proteins. 

- Escherichia coli Col plasmids lysis proteins. 
A number of Bacillus beta-lactamases. 

- Bacillus subtilis periplasmic oligopeptide-binding protein (gene oppA). 

- Borrelia burgdorferi outer surface proteins A and B (genes ospA and ospB). 
• Borrelia hemnsii variable major protein 21 (gene vmp21 ) and 7 (gene vmp7). 

- Chlamydia trachomatis outer membrane protein 3 (gene ompS). 
Fibrobacter succinogenes endoglucanase cel-3, 
Haemophilus influenzae proteins Pal and Pep. 

Klebsiella pullulunase (gene pulA). 
Klebsiella pullulunase secretion protein pulS. 
Mycoplasma hyorhinis protein p37. 

- Mycoplasma hyorhinis variant surface antigens A, B, and C (genes vIpABC). 
Neisseria outer membrane protein H.8. 

Pseudomonas aeruginosa lipopeptide (gene IppL). 
Pseudomonas solanacearum endoglucanase egl. 

- Rhodopseudomonas viridis reaction center cytochrome subunit (gene cytC). 

- Rickettsia 1 7 Kd antigen. 

- Shigella flexneri invasion plasmid proteins mxiJ and mxiM. 

- Streptococcus pneumoniae oligopeptide transport protein A (gene amiA). 
TrefX)nema pallidium 34 Kd antigen. 

Treponema pallidium membrane protein A (gene tmpA). 
Vibrio harveyi chitobiase (gene chb). 

- Yersinia virulence plasmid protein yscJ. 

- Halocyanin from Natrobacterium pharaonis [4], a membrane associated copper- binding protein This is the first 
archaebacterial protein known to be modified in such a fashion). 

[1818] From the precursor sequences of all these proteins, we derived a consensus pattem and a set of rules to 
identify this type of post-translational modification. 

^a^l^^ Consensus pattem{DERK}(6)-[LI VMFWSTAG](2)-[LI VMFYSTAGCQ]-[AGS]-C [C is the lipid attachment site] 
Addrtional rules: 1 ) The cysteine must be between positions 1 5 and 35 of the sequence in consideration 2) There must 
be at least one Lys or one Arg in the first seven positions of the sequence. 

[ 1] Hayashi S.. Wu H.C. J. Bioenerg. Biomembr. 22:451-471(1990). 
[2]Klein R. Somorjai R.L. Lau RC.K. Protein Eng. 2:15-20(1988). 
[ 3]von Heijne G. Protein Eng. 2:531-534(1989). 

[ 4]Mattar S.. Scharf B.. Kent S.B.H., Rodewald K., Oesterhelt D.. Engelhard M. J. Biol. Chem. 269:14939-14945 
(1 994). 

[1820] 774. Aminoacyl-transfer RNA synthetases class-ll signatures 

PROSITEcross.reference(s); AA_TRNIA_LIGASEJL1; AA_TRNA_LIGASEJL2 AminoacyMRNA synthetases (EC 
6.1 .1 .-) [1] are a group of enzymes which activate amino acids and transfer them to specific tRNA molecules as the 
first step in protein biosynthesis. In prokaryotic organisms there are at least twenty different types of aminoacyl-tRNA 
synthetases, one for each different amino acid. In eukaryotes there are generally two aminoacyl-tRNA synthetases for 
each different ammo acid: one cytosolic form and a mitochondrial form. While all these enzymes have a common 
function, they are widely diverse in terms of subunit size and of quaternary structure. 

[1821] The synthetases specific for alanine, asparagine. aspartic acid, glycine, histidine, lysine, phenylalanine pro- 
line senne, and threonine are referred to as class-ll synthetases [2 to 6] and probably have a common folding pattem 
in their catalytic domain for the binding of ATP and amino acid which is different to the Rossmann fold obsen/ed for 
the class I synthetases [7]. 
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[1822] Class-ll tRNA synthetases do not share a high degree of similarity, however at least three conserved regions 

are present [2.5.8]. Signature patterns from two of these regions have been derived. 

[1 823] Consensus pattem[FYH]-R-x-[DE]-x(4, 1 2)-[RH)-x(3)-F.x(3HDEl 

Consensus pattern(GSTALVFHDENQHRKP}-[GSTAHUVMF]-[DE]-R-[LIVMF]-x.[LIVMSTA^ 

[ IJSchimmel R Annu, Rev, Biochem. 56:125-158(1987). 
[ 2]Delarue M., Moras D. BioEssays 15:675-687(1993). 
[ 3]Schimmel R Trends Biochem. Sci. 16:1-3(1991). 

[ 4]Nagel G.M.. Doolittle R.F. Proc. Natl. Acad. Sci. U.S.A. 88:8121-8125(1991). 
[ 5]Cusack S., Haertlein M., Leberman R. Nucleic Acids Res. 19:3489-3498(1991). 
[ 6]Cusack S. Biochimie 75:1077-1081(1993). 

[ 7]Cusack S., Berthet-Golominas C. Haertlein M.. Nassar N.. Leberman R. Nature 347:249-255(1990). 
[ 8]Leveque F. Plateau R, Dessen R, Blanquet S. Nucleic Acids Res. 18:305-312(1990). 

[1824] 775. X. Trans-activation protein X 

This protein is found in hepadnaviruses where it is indispensable for replication. Number of members: 91 
[1825] 776. Thymidylate synthase active site 

[1826] Thymidylate synthase (EC 2. 1 .1 .45) [1 .2] catalyzes the reductive methylation of dUMP to dTMP with con- 
comitant conversion of 5,10-methylenetetrahydrofolate to dihydrofolate. Thymidylate synthase plays an essential role 
in DNA synthesis and is an important target for certain chemotherapeutic drugs. 

[1827] Thymidylate synthase is an enzyme of about 30 to 35 Kd in most species except in protozoan and plants 
where it exists as a bifunctional enzyme that includes a dihydrofolate reductase domain. 

[1828] A cysteine residue is involved in the catalytic mechanism (it covalently binds the 5,6-dihydro<iUMP interme- 
diate). The sequence around the active site of this enzyme is consen/ed from phages to vertebrates 

[1829] Consensus pattemR-x(2)-[LIVMl.x(3)-[FVV]-[QN]-x(8.9)-[LV]-x-P-C-[HAVM].x(3).[QMT]-[FYWl-x-rLVl fC is 
the active site residue] j i j i 

[ 1] Benkovic S.J. Annu. Rev Biochem. 49:227-251(1980). 

[ 2] Ross R. O'Gara F. Condon S. Appl. Environ. Microbiol. 56:2156-2163(1990). 

[1830] 777. Glycosyl hydrolases family 31 signatures 

[1831] It has been shown [1 ,2.3,E1] that the following glycosyl hydrolases can be. on the basis of sequence similar- 
ities, classified into a single family: 

- Lysosomal alpha-glucosidase (EC 3.2.1.20) (acid maltase) is a vertebrate glycosidase active at fow pH which 
hydrolyzes alpha{1 ->4) and alpha(1 ->6) linkages in glycogen, maltose, and isomaltose, 

- Alpha-glucosidase (EC 3.2.1.20) from the yeast Candida tsukunbaensis. 

- Alpha-glucosidase (EC 3.2. 1 .20) (gene malA) from the archebacteria Sulfolobus solfataricus 

- Intestinal sucrase-isomaltase (EC 3.2. 1 .48 / EC 3.2. 1 . 1 0) is a vertebrate membrane-bound, multifunctional enzyme 
complex which hydrolyzes sucrose, maltose and isomaltose. The sucrase and isomaltase domains of the enzyme 
are homologous (41% of amino acid identity) and have most probably evolved by duplication 

- Glucoamylase 1 (EC 3.2. 1 .3) (glucan 1 .4-alpha-glucosidase) from varbus fungal species. 

- Yeast hypothetical protein YBR229c. 

- Fissbn yeast hypothetical protein SpAC30D11.01c. 

[1832] An aspartic acid has been implicated [4] in the catalytic activity of sucrase, isomaltase, and lysosomal alpha- 
glucosidase. The region around this active residue is highly conserved and can be used as a signature pattern A 
second region, which contains two conserved cysteines, has been used as an additional signature pattem 
[1833] Consensus pattem [GF].[LIVMF]-W-x-D-M-[NSAl-E [D is the active site residue] 

Consensus pattem G^AV]-D-[LIVMTA]-C-G-[FY]-x(3)-[ST]-x(3)-L-C-x-R-W-x(2)-[LV]-[GSA]-[SA]-F-x-P.F-x-R.(D^ 
[ 1] Henrissat B. Bbchem. J. 280:309-316(1991). 

[ 2] Kinsella B.T. Hogan S., Larkin A., Cantwell B.A. Eur. J. Biochem. 202:657-664(1991). 

[ 3] Naim H.Y. Niernnann T, Kleinhans U., Hollenberg CP, Strasser A.W.M. FEBS Lett. 294:109-112(1991). 

[ 4] Hernnans M.M.R, Kroos M.A., van Beeumen J., Oostra B.A„ Reuser A.J.J. J. Biol. Chem. 266:13507-13512 



[1834] 778. Urease signatures 
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[1835] Urease (EC 3.5.1 .5) is a nickel-binding enzyme that catalyzes the hydrolysis of urea to carbon dbxide and 
ammonia [1]. Historically, it was the first enzyme to be crystallized (in 1926). It is mainly found in plant seeds, micro- 
organisms and invertebrates. In plants, urease is a hexamer of identical chains. In bacteria [2], it consists of either two 
or three different subunits (alpha, beta and gamma). 

[1 836] Urease binds two nickel ions per subun it; four histidine. an aspartate and a carbamated-lysine serve as llgands 
to these metals; an additional histidine is involved in the catalytic mechanism [3]. 

[1837] As signatures for this enzyme, a region was selected that contains two histidine that bind one of the nickel 
ions and the region of the active site histidine. 

[1838] Consensus pattern T-[AY]-{GA]-(GAT]-[UVM].D-x-H-[LIVM]-H-x(3)-P [The two H's bind nickel] 
Consensus pattern [LIVM](2)-[CT]-H-[HN].L-x(3)-tLIVM]-x(2).D-(LIVM].x-F-A [H is the active site residue] 

[ 1] Takishima K., Suga T., Mamiya G. Eur J. Biochem. 175:151-165(1988). 

[ 2] Mobley H.LT. Husinger R.R Microbiol. Rev. 53:85-108(1989). 

[ 3] Jabri E., Can- M.B., Hausinger R.R. Karplus RA. Science 268:998-1004(1995). 

[1839] 779. Tyrosine specific protein phosphatases signature and profiles 

[1840] Tyrosine specific protein phosphatases (EC 3. 1 .3.48) (PTPase) [1 to 5] are enzymes that catalyze the removal 
of a phosphate group attached to a tyrosine residue. These enzymes are very important in the control of cell growth 
proliferation, differentiation and transfomiation. Multiple forms of PTPase have been characterized and can be classi- 
fied into two categories: soluble PTPases and transmembrane receptor proteins that contain PTPase domain(s) The 
currently known PTPases are listed below: 
[1841] Soluble PTPases. 

- PTPN1 (PTP-1B). 

- PTPN2 (T-cell PTPase; TC-PTP). 

- PTPN3 (HI ) and PTPN4 (MEG), enzymes that contain an N-terminal band 4. 1 - like domain (see <PDOC00566>) 
and could act at junctions between the membrane and cytoskeleton 

- PTPN5 (STEP). 

- PTPN6 (PTP-IC; HCP; SHP) and PTPN11 (PTP-2C; SH-PTP3; Syp). enzymes which contain two copies of the 
SH2 domain at its N-temntnal extremity. The Drosophila protein corkscrew (gene csw) also belongs to this subgroup. 
PTPN7 (LC-PTP; Hematopoietic protetn-tyrosine phosphatase- HePTP) 

- PTPN8 (702-PEP). 

- PTPN9 (MEG2). 

- PTPN12 (PTP-G1; PTP-P19). 

- Yeast PTP1. 

- Yeast PTP2 which may be involved in the ubiquitin-mediated protein degradation pathway 

- Fission yeast pypi and pyp2 which play a role in inhibiting the onset of mitosis. 

- Fission yeast pyp3 which contributes to the dephosphorylation of cdc2. 
Yeast CDC1 4 which may be involved in chromosome segregation. 

- Yersinia virulence plasmid PTPAses (gene yopH). 

- Autographa californica nuclear polyhedrosis vims 1 9 Kd PTPase. 

[1 842] Dual specificity PTPases. 

- DUSP1 (PTPN10; MAP kinase phosphatase-1 ; MKP-1); which dephosphorylates MAP kinase on both Thr-183 
andTyr-185. 

- DUSP2 (PAC-1), a nuclear enzyme that dephosphorylates MAP kinases ERK1 and ERK2 on both Thr and Tyr 
residues. 

- DUSP3 (VHR). 

- DUSP4 (HVH2). 

- DUSP5 {HVH3), 

- DUSP6(Pyst1; MKP-3). 

- DUSP7 (Pyst2; MKP-X). 

- Yeast MSGS, a PTPase that dephosphorylates MAP kinase FUS3 

- Yeast YVH1. 

- Vaccinia virus HI PTPase; a dual specificity phosphatase. 
[1 843] Receptor PTPases. 
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[1844] Structurally, all known receptor PTPases, are made up of a variable length extracellular domain, followed by 
a transmembrane region and a C-terminal catalytic cytoplasmic domain. Some of the receptor PTPases contain fl- 
bronectin type III (FN-III) repeats, immunogtobulin-like domains. MAM domains or carbonic anhydrase-like domains 
in their extracellular region. The cytoplasmic region generally contains two copies of the PTPAse dcwnain. The first 
seems to have enzymatic activity, while the second is inactive but seems to affect substrate specificity of the first. In 
these domains, the catalytic cysteine is generally conserved but some other, presumably important, residues are not. 
[1845] In the following table, the domain structure of known receptor PTPases is shown: 



Extracellular 


Intracellular 




Ig FN-3 CAH MAM PTPase 


Leukocyte common antigen (LCA) (CD45) 


0 


2 


0 


0 


2 


Leukocyte antigen related (LAR) 


3 


8 


0 


0 


2 


Drosophila DLAR 


3 


9 


0 


0 


2 


Drosophila DPTP 


2 


2 


0 


0 


2 


PTP-alpha (LRP) 


0 


0 


0 


0 


2 


PTP-beta 


0 


16 


0 


0 


1 


PTP-gamma 


0 


1 


1 


0 


2 


PTP-defta 


0 


>7 


0 


0 


2 


PTP-epsilon 


0 


0 


0 


0 


2 


PTP-kappa 


1 


4 


0 


1 


2 


PTP-mu 


1 


4 


0 


1 


2 


PTP-zeta 


0 


1 


1 


0 


2 



[1 846] PTPase domains consist of about 300 amino acids. There are two consen/ed cysteines, the second one has 
been shown to be absolutely required for activity. Furthermore, a number of conserved residues in its immediate vicinity 
have also been shown to be important. 

[1847] A signature pattern was derived for PTPase domains centered on the active site cysteine. 
[1848] There are three profiles for PTPases. the first one spans the complete domain and is not specific to any 
subtype. The second profile is specific to dual-specificity PTPases and the third one to the PTP subfamily. 
[1849] Consensus pattern [LIVMF]-H-C-x(2)-G-x(3)-[STC]-[STAGP]-x-[LIVMFY] [C is the active site residue] 
[1850] Notethe M-phase inducer phosphatases (cdc25-type phosphatase) are tyrosine-protein phosphatases that 
are not structurally related to the above PTPases. 

[1851] Notethis documentation entry is linked to both a signature pattern and to profiles. As profiles are much more 
sensitive than the pattern, you should use them if you have access to the necessary software tools to do so. 

[ 1] Fischer E.H.. Charbonneau H., Tonks N.K. Science 253:401-406(1991). 
[ 2) Charbonneau H., Tonks N.K. Annu. Rev. Cell Biol. 8:463-493(1992). 
[ 3] Trowbridge I.S. J. Biol. Chem. 266:23517-23520(1991). 
[ 4] Tonks N.K., Charbonneau H. Trends Biochem. Sci. 14:497-500(1989). 
[ 5] Hunter T Cell 58:1013-1016(1989). 

[1852] 780. Connexins signatures 

[1 853] Gap junctions [1 ] are specialized regions of the plasma membrane which consist of closely packed pairs of 
transmembrane channels, the connexons, through which small molecules diffuse from a cell to a neighboring cell. Each 
connexon is composed of an hexamer of an integral membrane protein which is often referred to as connexin In a 
given species there are a number of different, yet structurally related, tissue specific, forms of connexins. The types 
of connexins which are currently known are listed below. 

Connexin 56 (Cx56). 

- Connexin 50 (Cx50) (lens fiber protein MP70). 

- Connexin 46 (Cx46) (aIpha-3). 
• Connexin 45 (Cx45) (alpha-6). 

- Connexin 43 (Cx43) (aIpha-1 ). 

- Connexin 40 (Cx40) (alpha-5). 

- Connexin 38 (Cx38) (alpha-2). 
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- Connexin 37 (Cx37) (alpha-4). 

- Connexin 33 (Cx33) (alpha-7). 

- Connexin 32 (Cx32) (beta-1). 

- Connexin 31 . 1 (Cx31 . 1 ) (beta-4). 

- Connexin 31 (Cx31) (beta-3). 

- Connexin 30.3 (Cx30.3) (bela-5). 

- Connexin 26 (Cx26) (beta-2). 



[1854] Structurally the connexins consist of a short cytoplasmic N-terminal domain, followed by four transmembrane 
segments that delimit two extracellular and one cytoplasmic loops; the C4erminal domain is cytoplasmic and its length 
|s vanable (from 20 resdues in Cx26 to 260 residues In Cx56). The schematic representation of this structure is shown 
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[1855] The sequences of the two extracellular loops are well conserved. In both loops there are three conserved 
cysteines which are involved in disulfide bonds. A signature patterns from each of these two loop regions has been built 
[1856] Consensus pattemC-[DN]-T-x-Q-P-G-C-x(2)-V-C-[FY]-D [The three C's are involved in drsulfide bondsjc^n 
sensus pattemC-x(3.4).P-C-x(3)-[LIVM].[DEN]-C-[FY]-[LIVMHSA]-[KR]-P pT^e three Cs are involved i^dlfflde 

[1857] [ 1] Goodenough D.A., Goliger J.A., Paul D.L Annu. Rev Biochem. 65:475-502(1996) 
[1858] 781 . Gram-positive cocci surface proteins 'anchoring' hexapeptide 

[1859] Surface proteins from Gram-positive cocci contains a conserved hexapeptide located a few residues down- 
strearn of a hydrophobic C-temiinal membrane anchor region which is followed by a cluster of basic amino acids fll 
This structure is represented in the following schematic representation: 



i +.+ 

I Variable length extracellular domain |H| Anchor |B| 

H': conserved hexapeptide. 
"B': cluster of basic residues. 



281 



EP 1 033 405 A2 



[1 860] It has been proposed that this hexapeptide sequence is responsible for a post4ranslational modification 
essary for the proper anchoring of the proteins which bear it. to the ceil wall. 
Proteins known to contain such hexapeptide are listed below: 



Aggregation substance from streptococcus faecalis (asal). 

C5a peptidase from Streptococcus pyogenes (scpA). 

C protein alpha-antigen from Streptococcus agalactiae (bca). 

- Cell surface antigen I/I I (PAC) from Streptococcus mutans. 
Dextranase from Streptococcus downei (dex). 
Fibronectin-binding protein from Staphylococcus aureus (fnbA). 
Fimbria! subunits from Actinomyces naeslundii and viscosus. 
IgA binding protein from Streptococcus pyogenes (arp4). 

IgA binding protein (B antigen) from Streptococcus agalactiae (bag). 
IgG binding proteins from Streptococci and Staphylococcus aureus. 
Intemalin A from Listeria monocytogenes (inIA). 
M proteins from streptococci. 

Muramidase-released protein from Streptococcus suis (mrp). 

- Nisin leader peptide processing protease from Lactococcus lactis (nisP). 
Protein A from Staphylococcus aureus. 

Trypsln-resistant surface T protein from streptococci. 
Wall-associated protein from Streptococcus mutans (wapA). 
Wall-associated serine proteinases from Lactococcus lactis. 



[1861] Consensus pattemL-P-x-T-G-[STGAVDE] 

[1862] [ 1] Schneewind O.. Jones K.F.. FIschetti V.A. J. Bacteriol. 172:331 0-331 7(1 ggO). 
[1863] 782. Gamma-glutamyltranspeptidase signature 

[1864] Gamma-glutamyltranspeptidase (EC 2.3.2.2) (GGT) [1 ] catalyzes the transfer of the gamma-glutamyl moiety 
of glutathione to an acceptor that may be an amino acid, a peptide or water (forming glutamate). GGT plays a key role 
in the gamma-glutamyl cycle, a pathway for the synthesis and degradation of glutathione. In prokaryotes and eukary- 
otes, it is an enzyme that consists of two polypeptide chains, a heavy and a light subunit, processed from a single 
Cham precursor. The active site of GGT is known to be located in the light subunit. 

[1865] The sequences of mammalian and bacterial GGT show a number of regions of high similarity [2] Pseu- 
domonas cephalosporin acylases (EC 3.5. 1 .-) that convert 7-beta-(4<arboxybutanamido)-cephalosporanic acid (GL- 
7ACA) into 7-aminocephalosporanic acid (7ACA) and glutaric acid are evolutionary related to GGT and also show 
some GGT activity [3]. Like GGT, these GL-7ACA acylases, are also composed of two subunits. 
[1866] One of the conserved regions correspond to the N-terminal extremity of the mature light chains of these 
enzymes. This region has been used as a signature pattern. 

[1867] Consensus pattemT-[STA]-H-x-[ST].[LIVMAI-x(4)-G-[SN]-x-V-[STA]-x-T-x-T-[LIVM]-[NEl.x(1,2)-[FY]-G 
[ 1] Tate S.S., Meister A. Meth. Enzymol. 113:400-419(1985). 

[ 2] Suzuki K, Kumagai H., Echigo T, Tochikura T J. Bacteriol. 171:5169-5172(1989). 
[ 3] Ishiye M., Niwa M. Biochim. Biophys. Acta 1132:233-239(1992). 

[1868] 783. Ferrochelatase signature 

[1869] Ferrochelatase (EC 4.99.1.1) (protoheme ferro-lyase) [1,2] catalyzes the last step in heme biosynthesis* the 
chelatksn of a ferrous ion to proto-porphyrin IX, to form protoheme. 

[1870] In eukaiyotes, ferrochelatase is a mitochondrial protein bound to the inner membrane, whose active site faces 
the mitochondrial matrix. The mature form of eukaryotic ferrochelatase is composed of about 360 amino acids. In 
bacteria, ferrochelatase (gene hemH) [3] is a protein of from 310 to 380 amino acids. 

[1871] The human autosomal dominant disease protoporphyria is due to the reduced activity of ferrochelatase. 
[1 872] The signature pattern for this enzyme is based on a conserved regbn which contains a histidine residue which 
could be involved in binding iron. 

[1873] Consensus pattem[LI VMF](2)-x-[ST)-x-H-[GS]-(LIVM]-P-x{4,5)-[DENQKR].x-G-[DP]-x(1 .2)-Y 

[ 1] Labbe-Bois R. J. BioL Chem. 265:7278-7283(1990). 

[ 2] Brenner D.A., Frasier F. Proc. Natl. Acad. Sci. U.S.A. 88:849-853(1991). 

[ 3] Miyamoto K., Nakahigashi K., Nishimura K., Inokuchi K J. Mol. Biol. 219:393-398(1991). 
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[1874] 784. Cellulose-binding donnain, bacterial type 

[1 875] The microbial degradation of cellulose and xylans requires several types of enzyme such as endoglucanases 
(EC 3.2.1.4). cellobiohydrolases (EC 3.2.1.91) (exoglucanases), orxylanases (EC 3.2.1.8) [1]. 
[1876] Structurally, cellulases and xylanases generally consist of a catalytic domain joined to a cellulose-binding 
domain (CBD) by a short linker sequence rich in proline and/or hydroxy-amino acids. 

[1877] The CBD of a number of bacterial cellulases has been shown to consist of about 105 amino acid residues 
[2]. Enzymes known to contain such a domain are: 

Endoglucanase (gene endl) from Butyrivibrio fibrisolvens. 
Endoglucanases A (gene cenA) and B (cenB) from Cellulomonas fimi. 
Exoglucanases A (gene cbhA) and B (cbhB) from Cellulomonas fimi. 
Endoglucanase E-2 (gene celB) from Thermomonospora f usca. 
Endoglucanase A (gene celA) from Microbispora bispora. 

Endoglucanases A (gene celA), B (celB) and C (ceIC) from Pseudomonas fluorescens. 
Endoglucanase A (gene celA) from Streptomyces lividans. 
Exocellobiohydrolase (gene cex) from Cellulomonas fimi. 
Xylanases A (gene xynA) and B (xynB) from Pseudomonas fluorescens. 
Arabinofuranosidase C (EC 3.2.1.55) (xylanase C) (gene xynC) from Pseudomonas fluorescens. 
Chitinase 63 (EC 3,2.1.14) from Streptomyces plicatus. 
Chitinase C from Streptomyces lividans. 

[1878] The CBD domain is found either at the N-terminal or at the C-terminal extremity of these enzymes. As it is 
shown in the foltowing schematic representation, there are two conserved cysteines in this CBD domain - one at each 
extremity of the domain - which have been shown [3] to be involved in a disulfide bond. There are also four conserved 
tryptophan residues which could be involved in the interaction of the CBD with polysaccharides. 

I I 

xCxxxxWxxxxxNxxxWxxxxxxxWxxxxxxxxWNxxxxxGxxxxxxxxxxCx 



'C: conserved cysteine involved in a disulfide bond, position of the pattern. 



Consensus patternW-N-[STAGR]-ISTDN]-[LIVM]-x(2)-[GST]-x.[GST]-x(2)- [LIVMFT]-[GA] 

[1] Gilkes N.R.. Henrissat B.. Kilbum D.G., Miller R.C. Jr., Warren R.A.J. Microbiol. Rev. 55:303-315(1991). 
[ 2] Meinke A., Gilkes N.R., Kilburn D.G., Miller R.C, Jr.. Warren R.A.J. Protein Seq. Data Anal. 4:349-353(1991). 
[ 3] Gilkes N.R.. Claeyssens M., Aebersold R., Henrissat B.. Meinke A.. Morrison H.D., Kilburn D.G., Warren R.A. 
J., Miller R.C. Jr. Eur. J. Biochem. 202:367-377(1991). 

[1879] 785. Amidases signature 

[1880] It has been shown [1 ,2,3] that several enzymes from various prokaryotic and eukaryotic organisms which are 
involved in the hydrolysis of amides (amidases) are evolutionary related. These enzymes are listed below 

- Indoleacetamide hydrolase (EC 3.5.1 .-), a bacterial plasmid-encoded enzyme that catalyzes the hydrolysis of in- 
dole-3-acetamide (I AM) into indole-3-acetate (lAA), the second step in the biosynthesis of auxins from tryptophan. 

- Acetamidase from Emericella nidulans (gene amdS), an enzyme which allows acetamide to be used ds a sole 
carbon or nitrogen source. 

- Amidase (EC 3,5,1.4) from Rhoctococcus sp. N-774 and Brevibacterium sp, R312 (gene amdA). This enzyme 
hydrolyzes propbnamides efficiently, and also at a bwer efficiency, acetamide. acrylamide and indoleacetamide. 

- Amklase (EC 3.5.1 .4) from Pseudomonas chlororaphis. 

- 6-aminohexanoate-cyclic-dimer hydrolase (EC 3.5.2.12) (nybn oligomers degrading enzyme El) (gene nylA), a 
bacterial plasmid encoded enzyme which catalyzes the first step in the degradation of 6-aminohexanoic acid cyclic 
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dimer. a by-product of nylon manufacture [4]. 

- Glutanriyl-tRNA(Gln) amidotransf erase subunit A [5]. 
Mammalian fatty acid amide hydrolase (gene FAAH) [6]. 
A putative amidase from yeast (gene AMD2). 

5 - Mycobacterium tuberculosis putative amidases amiA2, amiB2, amiC and amiD. 

[1881] All these enzymes contain in their central section a highly conserved region rich In glycine, serine, and alanine 
residues. This region has been used as a signature pattern. 

Consensus pattern: G-IGA]-S-[GSl-[GS]-G-x-[GSA]-[GSAW].x-[UVM]-[GSA]-x(6)-[GSAT]-x-[GA]-x-{DE]-x-(GAh^^^ 
10 [LIVM]-R-x-P-[GSAC] 

[ 1] Mayaux J.-R, Cerbelaud E., Soubrier R. Faucher D.. Petre D. J. Bacteriol. 172:6764-6773(1990). 
[ 2] Hashimoto Y. Nishiyama M., IkehataO., HorinouchI S., BeppuT. Biochlm. Biophys. Acta 1088:225-233(1991). 
[ 3] Chang T-H., Abelson J. Nucleic Acids Res. 18:7180-7180(1990). 
IS [ 4]TsuchiyaK., FukuyamaS., Kanzaki N., KanagawaK.. NegoroS., OkadaH. J. Bacteriol. 171:3187-3191(1989). 

[ 5] Curnow A.W., Hong K.W.. Yuan R.. Kim S.L, Martins O., Winkler W., Henkin TM., Soil D. Proc. Natl. Acad. 
Scl. U.S.A. 94:11819-11826(1997). 

[ 6] Cravatt B.R, Giang D.K., Mayfield S.R, Soger D.L, Lemer R.A.. Gilula N.B. Nature 384:83-87(1 996>. 

20 [1882] 786. Glycosyl hydrolases family 10 active site 

[1883] The microbial degradation of cellulose and xylans requires several types of enzymes such as endoglucanases 
(EC 3.2.1.4). cellobiohydrolases (EC 3.2.1.91) (exoglucanases), or xylanases (EC 32.1.8) [1,2]. Fungi and bacteria 
produces a spectrum of cellulolytic enzymes (cellulases) and xylanases which, on the basis of sequence similarities, 
can be classified into families. One of these families is known as the cellulase family F [3] or as the glycosyl hydrolases 

25 family 10 [4,E1]. The enzymes which are currently known to belong to this family are listed below. 

Aspergillus awamorl xylanase A (xynA). 
Bacillus sp. strain 1 25 xylanase (xynA). 
Bacillus stearothermophilus xylanase. 
30 - Butyrivlbrio fibrisolvens xylanases A (xynA) and B (xynB). 

Caldocellum saccharolyticum bifunctlonal endoglucanase/exoglucanase (celB). This protein consists of two do- 
mains; it Is the N-terminal domain, which has exoglucanase activity, which belongs to this family. 
Caldocellum saccharolyticum xylanase A (xynA). 

Caldocellum saccharolyticum ORF4. This hypothetical protein is encoded in the xynABC operon and is probably 
35 a xylanase. 

Cellulomonas fimi exoglucanase^lanase (cex). 
Clostridium stercorarium thermostable celloxylanase. 

- Clostridium thermocellum xylanases Y (xynY) and Z (xynZ). 
Cryptococcus albldus xylanase. 

^0 - Penlcilllum chrysogenum xylanase (gene xylP). 

Pseudomonas fluorescens xylanases A (xynA) and B (xynB). 

- Ruminococcus flavefaciens bifunctranal xylanase XYLA (xynA). This protein consists of three domains: a N-ter- 
minal xylanase catalytic domain that belongs to family 11 of glycosyl hydrolases; a central domain composed of 
short repeats of Gin. Asn an Trp, and a C-termlnal xylanase catalytic domain that belongs to family 10 of glycosyl 

^ hydrolases. 

Streptomyces lividans xylanase A (xInA). 
Thermoanaerobacter saccharolyticum endoxylanase A (xynA). 
Thermoascus aurantiacus xylanase. 
Thermophilic bacterium Rt8.B4 xylanase (xynA). 

50 

[1884] One of the conserved regions in these enzymes is centered on a conserved glutamic acid residue which has 
been shown [5], in the exoglucanase from Cellubmonas flml, to be directly Involved In glycosidic bond cleavage by 
acting as a nucleophlle. This region has been used as a signature pattern. 

[1885] Consensus pattem[GTA]-x(2)-[LIVN]-x-(IVMFl-[ST]-E-[LIY]-[DN]-[LIVMF] [E is the active site residue] 

55 

( 1] Beguin R Annu. Rev Microbiol. 44:219-248(1990). 

[ 2] Gllkes N.R.. Henrissat B., Kllbum D.G., Miller R.C. Jr., Warren R.A.J. Microbiol. Rev 55:303-315(1991), 
[ 3] Henrissat B.. Claeyssens M., Tomme R, Lemesle L, Mornon J.-R Gene 81:83-95(1989). 
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[ 4] Henrissat B. Biocham. J. 280:309-316(1 991). 

( 5] Tull D.. Withers S.G., Gilkes N.R.. Kilbum D.G.. Warren R.A.J., Aebersold R. J. Biol. Chem. 266:15621-15625 

(1 Q91 ). 

[1886] 787. Fructose-bisphosphate aldolase class-li signatures 

[1887] Fructose-bisphosphate aldolase (EC 4.1.2.13) [1,2] Is a glycolytic enzyme that catalyzes the reversible aldol 
cleavage or condensatfon of f ructose-1 ,6- bisphosphate Intodlhydroxyacetone-phosphate and glyceraldehyde 3-phos- 
phate. There are two classes of f njctose-blsphosphate aldolases with different catalytic mechanisms Class-ll aldolases 
[2], mainly found in prokaryotes and fungi, are homodimerc enzymes which require a divalent metal ton - qenerallv 
zinc - for their activity. ' 
[1 888] This family also includes the following proteins: 

- Escherichia coli galactitol operon protein gatY which catalyzes the transformation of tagatose 1 .6-bisphosphate 
into glycerone phosphate and D- glyceraldehyde 3-phosphate. 

- Escherichia coli N-acetyl galactosamlne operon protein agaY which catalyzes the same reactton as that of galY 

[1889] As signature patterns for this class of enzyme, two conserved regions were selected. The first pattern is 
located in the first half of the sequence and contains two histldine residues that have been shown [4] to be involved In 
binding a zinc ion. The second Is located In the C-temiinal section and contains clustered acidic residues and glycines 
[1890] Consensus pattem[FYVMT]-x(1 ,3)-ILIVMH]-[APN]-[LIVM]-x(1 .2)-(LIVM]-H-x-D-H-[GACH] [The two H's are 
zinc ligands] i j i 

Consensus pattern[LIVM]-E-x-E-[LIVM]-G-x(2)-[GM]-[GSTAJ-x-E 

[ 1] Perham R.N. Biochem. Soc. Trans. 18:185-187(19^)). 

[2] Marsh J.J., Lebherz H.G. Trends Biochem. Sci. 17:110-113(1992). 

[ 3] von derOsten C.H.. Barbas C.F. III. Wong C.-H., Sinskey A.J. Mol. Microbiol. 3-1625-1637(1989) 
[ 4] Berry A., Marshall K.E. FEBS Lett. 31 8:1 1 -1 6(1 993). 

[1891] 788. Prolyl ollgopeptldase family serine active site 

[1892] The prolyl ollgopeptldase family (1 ,2,3] consist of a number of evolutionaiy related peptidases whose catalytto 
activity seems to be provided by a charge relay system similar to that of the trypsin family of serine proteases but 
which evolved by independent convergent evolution. The known members of this family are listed below 

- Prolyl endopeptidase (EC 3.4.21 .26) (PE) (alsocalled post-proline cleaving enzyme). PE isan enzyme that cleaves 
peptide bonds on the C-terminal side of prolyl residues. The sequence of PE has been obtalnedfrom a mammalian 
species (pig) and from bacteria (Flavobacterium meningosepticum and Aeromonas hydrophila); there is a high 
degree of sequence conservation between these sequences. 

- Escherichia coli protease II (EC 3.4.21 .83) (ollgopeptldase B) (gene prtB) which cleaves peptide bonds on the C- 
terminal side of lysyl and argininyl residues. 

■ Dlpeptklyl peptidase IV (EC 3.4.1 4.5) (DPP IV). DPP IV is an enzyme that removes N-termlnal dlpeptldes sequen- 
tially from polypeptides having unsubstituted N-temiIni provided that the penultimate residue is proline 

- Yeast vacuolar dipeptWyl aminopeptldase A (DPAP A) (gene: STE1 3) which is responsible for the proteolytic mat- 
uratton of the alpha-factor precursor. 

■ Yeast vacuolar dipeptidyl aminopeptidase B (DPAP B) (gene: DAP2). 

- Acylamino-acid-releasing enzyme (EC 3.4. 1 9. 1 ) (acyl-peptide hydrolase). This enzyme catalyzes the hydrolysis 
of the amino-terminal peptide bond of an N-acetylated protein to generate a N-acetylated amino acid and a protein 
with a free amino-terminus. 

[1893] A conserved serine residue has experimentally been shown (in E.coli protease II as well as In pig and bacterial 
PE) to be necessary for the catalytic mechanism. This serine, which is part of the catalytic triad (Ser His Asp) is 
generally located about 150 residues away from the C-temiinal extremity of these enzymes (which are all proteins that 
contains about 700 to 800 amino acids). 

lll^ Consensus pattemD-x(3)-A-x(3)-[LIVMFYWpx(1 4)-G-x-S-x-G-G-[LIVMFYW](2) [S is the active site residue] 
[1895] Note these proteins belong to families S9A/S9B^S9C in the classification of peptidases [4,E1 ]. 

[ 1] Rawllngs N.D.. Polgar L. Barrett A.J. Biochem. J. 279:907-911(1991). 
( 2] Barrett A.J.. Rawllngs N.D. Biol. Chem. Hoppe-Seyler 373:353-360(1992). 
1 3] Polgar L. Szabo E. Biol. Chem. Hoppe-Seyler 373:361-366(1992). 
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[ 4) Rawlings N.D.. Barrett A. J. Meth. Enzymol. 244:19-61 (1994). 
[1896] 789. FormatB-tetrahydrofolate ligase signatures 

[1897] Formate-tetrahydrofolate ligase (EC 6.3.4.3) (tormyltetrahydrofolate synthetase) (FTHFS) is one of the en- 
zymes participating in the transfer of one-carbon units, an essential element of various biosynthetic pathways. In many 
of these processes the transfers of one-carbon units are mediated by the coenzyme tetrahydrofolate (THF). \ferious 
reactions generate one-carbon derivatives of THF which can be interconverted between different oxidatfon states by 
FTHFS. methylenetetrahydrofolate dehydrogenase (EC 1.5.1.5) and methenyltetrahydrofolate cyclohydroiase (EC 
3.5.4.9). 

[1898] In eukaryotes the FTHFS activity is expressed by a muftitunctional enzyme. C-1 -tetrahydrofolate synthase 
(CI -THF synthase), which also catalyzes the dehydrogenase and cyclohydroiase activities. Two forms of CI -THF 
synthases are known [1], one is located in the mitochondrial matrix, while the second one is cytoplasmic. In both forms 
the FTHFS domain consist of about 600 amino acid residues and is located in the C-terminal sectbn of CI-THF syn- 
thase. In prokaryotes FTHFS activity is expressed by a monofunctional homotetrameric enzyme of about 560 amino 
acid residues [2]. 

[1899] The sequence of FTHFS is highly conserved in all forms of the enzyme. As signature patterns, two regions 
that are almost perfectly conserved were selected. The first one is a glycine-rich segment located in the N-terminal 
part of FTHFS and which could be part of an ATP-binding domain [2]. The second pattem is located in the central 
section of FTHFS. 

[1900] Consensus pattemG-[LIVM]-K-G-G-A-A-G-G-G-Y 
Consensus patternV-A-T-[IV]-R-A-L-K-x-[HN]-G-G 

[ 1] Shannon K.W., Rabinowitz J.C. J. Biol. Chem. 263:7717-7725(1988). 

[ 2] Lovell C.R.. Przybyla A., Ljungdahl LG. Biochemistry 29:5687-5694(1990). 

[1901] 790. Transthyretin signatures 

[1902] Transthyretin (prealbumin) [1) is a thyroid homnone-binding protein that seems to transport thyroxine (T4) 
from the bloodstream to the brain. It is a protein of about 130 amino acids that assembles as a homotetramer and 
forms an internal channel that binds thyroxine. Transthyretin is mainly synthesized in the brain choroid plexus In 
humans, variants of the protein are associated with distinct forms of amyloidosis. 

[1 903] The sequence of transthyretin is highly conserved in vertebrates. A number of uncharacterized proteins also 
belong to this family: 

Escherichia coli hypothetical protein yedX. 

- Bacillus subtilis hypothetical protein yunM. 

- Caenorhabditis elegans hypothetical protein R09H10.3. 

- Caenortiabditis elegans hypothetical protein ZK697.8. 

[1904] Two regions were selected as signature pattems. The first located in the N-terminal extremity starts with a 

lysine known to be involved in binding T4. The second pattem is located in the C-terminal extremity 

[1905] Consensus pattem[KH]-[IV]-L-[DN]-x(3)-G-x-P-A.x(2).[IV]-x-[IV] [The K binds thyroxine] 

Consensus pattern Y-[TH]-[IV]-[AP]-x(2)-L-S-[PQ]-[FYW]-[GS]-[FY]-[QS] 

[1906] [ 1] Schreiber G., Richardson S.J. Comp. Biochem. Physiol. 1166:137-160(1997). 

[1907] 791 , Dihydropteroate synthase signatures 

[1908] All organisms require reduced folate cofactors for the synthesis of a variety of metabolites. Most microorgan- 
isms must synthesize folate de novo because they lack the active transport system of higher vertebrate cells which 
allows these organisms to use dietary folates. Enzymes that are involved in the biosynthesis of folates are therefore 
the target of a variety of antimicrobial agents such as trimethoprim or sulfonamides. 

[1909] Dihydropteroate synthase (EC 2.5.1.15) (DHPS) catalyzes the condensation of 6-hydroxymethy|.7.8-dihy. 
dropteridine pyrophosphate to para-aminobenzoic acid to form 7.8-dihydropteroate. This is the second step in the three 
steps pathway leading from 6-hydroxymethyl-7,8Hdihydrq3terin to 7.8-dihydrofolate. DHPS is the target of sulfonamides 
which are substrates analog that compete with para-aminobenzoic acid. 

[1910] Bacterial DHPS (gene sul or foIP) [1] is a protein of about 275 to 315 amino acid residues which is either 

chromosomally encoded or found on various antibiotic resistance plasmids. In the lower eukaryote Pneumocystis car- 

inii. DHPS is the C-terminal ctonnain of a multifunctional folate synthesis enzyme (gene fas) [2]. 

[1911] Two signature pattems for DHPS were developed, the first signature is located in the N^erminal section of 

these enzymes, while the second signature is kx:ated in the central section. 

[1912] Consensus pattem[LIVM]-x-[AG]-{UVMF](2)-N-x-T.x-D-S-F-x-D-x-[SG] 
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Consensus pattem[GE]-|SA]-x-[LIVM](2)-D-(LIVM]-G-[GP]-x(2)-[STAJ-x-P 

[ 1 J Slock J., Stahly D.R, Han C.-Y, Six E.W., Crawford I.P. J. Bactertol. 172:7211-7226(1990). 

[ 2] Volpes R. Dyer M., Scaife J.G., Darby G.. Stammers D.K., Delves C.J. Gene 112:213-218(1992). 

[1 91 3] 792. Phosphatidyllnositol 3- and 4-kinases signatures 

[1914] Phosphafidylinositol 3-kinase (PI3^<inase) (EC 2.7.1.137) [1] is an enzyme that phosphorylates phosphoi- 
nositides on the 3-hydroxyl group of the inositol ring. The exact function of the three products of PI3-kinase - PI-3-R 
PI-3,4-P(2) and PI-3.4.5-P(3) - is not yet known, although it is proposed that they function as second messengers in 
cell signalling. Currently, three forms of PI3-kinase are known: 

- The mammalian enzyme which is a heterodimer of a 110 Kd catalytic chain (p110) and an 85 Kd subunit (p85) 
which altows it to bind to activated tyrosine protein kinases. There are at least two different types of pi 00 subunits 
(alpha and beta). 

- Yeast TOR1/DRR1 and TOR2rtDRR2 [2]. PI3-kinases required for cell cycle activation. Both are proteins of about 

280 Kd. 

- Yeast VPS34 [3], a PI3-kinase involved in vacuolar sorting and segregation. VPS34 is a protein of about 100 Kd 

- Arabidopsis thaliana and soybean VPS34 homologs. 

[1915] Phosphatidylinositol 4.kinase (PI4-kinase) (EC 2.7.1.67) [4] is an enzyme that acts on phosphatidylinositol 
(PI) in the first committed step in the productbn of the second messenger inositol-1.4,5,-trisphosphate Currently the 
following forms of PI4-kinases are known; 

Human PI4-kinase alpha. 

- Yeast PIK1 , a nuclear protein of 120 Kd. 

- Yeast STT4, a protein of 21 4 Kd. 

[1916] The PIS- and PI4-kinases share a well conserved domain at their C-terminal secticxi; this domain seems to 
be distantly related to the catalytic domain of protein kinases [2]. Two signature patterns were devetoped from the best 
conserved parts of this domain. 
[1917] Four additional proteins belong to this family: 

- Mammalian FKBP-rapamycin associated protein (FRAP) [5]. which acts as the target for the cell-cycle arrest and 
immunosuppressive effects of the FKBP12-rapamycin complex. 

- Yeast protein ESR1 [6] which is required for cell growth, DNA repair and meiotic recombination. 

- Yeast protein TEL1 which Is involved in controlling telomere length. 

- Yeast hypothetical protein YHR099w, a distantly related member of this family. 

- Fission yeast hypothetical protein SpAC22E1 2. 1 6C. 

[1918] Consensus pattem[LI VI^FAC]-K-x(1 .3).[DEAHDE]-[LIVMC]-R-Q-[DEl-x(4)-Q 
Consensus pattern[GS]-x.[AV].x(3)-[LIVM]-x(2).[FYH]-[LIVM](2).x4LIVMF]-x-D-R.H-x(2) 

[ 1] Hiles I.D.. Otsu M., Volinia S.. Fry M.J.. Gout I., Dhand R. Panayotou G.. Ruiz-Larrea F., Thompson A Totty 

N.F., Hsuan J. J., Courtneidge S.A., Parker P J., Waterfield M.D. Cell 70:419-429(1 992). 

[ 2] Kunz J.. Henriquez R., Schneider U., Deuter-Reinhard f\A., Mowa N., Hall M.N. Cell 73-585-596(1993) 

[ 3] Schu P v., Takegawa K.. Fry M.J.. Stack J.H.. Waterfield M.D.. Emr S.D. Science 260:88-91 (1 993) 

[ 4] Garcia-Bustos J.F., Marini F., Stevenson I., Frei C. Hall M.N. EMBO J. 13:2352-2361(1994) 

[5] Brown EJ., Albers M.W., Shin TB.. Ichikawa K., Keith C.T. Lane W.S., Schreiber S.L Nature 369:756-758 

(1994). 

[ 6] Kato R., Ogawa H. Nucleic Acids Res. 22:3104-3112(1994). 
[1 91 9] 793. FAD-dependent glycerol-3-phosphate dehydrogenase signatures 

[1920] FAD-dependent glycerol-3-phosphate dehydrogenase (EC 1.1.99.5) (GPD) catalyzes the conversion of gtyc- 
erol-3.phosphate into dihydroxyacetone phosphate. In bacteria [1 ] it is associated with the utilization of glycerol coupled 
to respiration. In Escherichia coli. two isozymes are known: one expressed under anaerobic conditbns (gene glpA) 
and one in aerobic conditions (gene gIpD). In eukaryotes. a mitochondrial form of GPD participates in the glycerol 
phosphate shuttle in conjunction with an NAD-dependent cytoplasmic GPD (EC 1 .1 .1 .8) [2.3]. 
[1921] These enzymes are proteins of about 60 to 70 Kd which contain a probable FAD-binding domain in their N- 
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terminal extremity. The mammalian enzyme differs from the bacterial or yeast proteins by having an EF-hand calcium- 
binding region (See <PDC)C00018>) in its C-temninal extremity. 

(1 922] Two signature pattems were developed. One based on the first half of the FAD-binding domain and one which 
corresponds to a consen/ed regbn in the central part of these enzymes. 
[1923] Consensus pattem[IV]-G-G-G-x(2)-G-[STACV]-G-x-A-x-D-x(3)-R-G 
Consensus patternG-G-K-x(2)-[GSTE]-Y-R-x(2)-A 

[ 1] Austin D.. Larson TJ. J. Bacteriol. 173:101-107(1991). 

[ 2] Roennow B., Kielland-Brandt M.C. Yeast 9:1121-1130(1993). 

[ 3] Brown LJ., McDonald M.J., Lehn D.A.. Moran S.M, J. Biol. Chem. 269:14363-14366(1994). 

[1924] 794. NOL1/NOP2/sun family signature 

[1926] The following proteins seems to be evolutionary related: 

- Mammalian proHferating-cell nucleolar antigen pi 20 (gene NOLI) which may play a role in the regulation of the 
cell cycle and the increased nucleolar activity that is associated with the cell proliferation. 

- Yeast nucleolar protein NOP2 (or YN A1 ) which could be involved in nucleolar f unctbn during the onset of growth, 
and in the maintenance of nucleolar structure. 

- Yeast hypothetical protein YBL024w. 
Bacterial protein sun (also known as fmu), 
Escherichia coll hypothetical protein yebU. 

- Mycobacterium tuberculosis hypothetical protein MtCY21 B4.24. 

- Methanococcus jannaschii hypothetical protein MJ0026. 

NOL1 is a protein of 855 residues, NOP2 consists of 618 residues, YBL024w of 684. sun is a protein of about 430 to 
450 residues and MJ026 has 274 residues. They share a consen/ed central domain which contains some highly con- 
served regions. One of these regions was selected as a signature pattern. 
[1926] Consensus pattem[FV]-D-[KRA]-[LIVMA]-L-x-D-[AV]-P-C-[ST]-[GA] 
[1 927] 795. moaA / nif B / pqqE family signature 

[1928] A number of proteins involved in the biosynthesis of metallo cofactors have been shown [1 ,2] to be evolutionary 
related. These proteins are: 

- Bacterial and archebacterial protein moaA, which is involved in the biosynthesis of the molybdenum cofactor (mo- 
lybdopterin; MPT). 

- Arabidopsis thaliana cnx2, a protein involved in molybdopterin biosynthesis and which is highlys similar to moaA. 

- Bacillus subtilis narA, which seems to be the moaA ortholog in that bacteria. 

- Bacterial protein nifB (or fixZ) which is involved in the biosynthesis of the nitrogenase iron-molybdenum cofactor. 

- Bacterial protein pqqE which is involved in the biosynthesis of the cofactor pyrrolo-quinoline-quinone (PQQ). 

- Pyrococcus furiosus cmo. a protein involved in the synthesis of a molybdopterin-based tungsten cofactor. 

- Caenorhabditis elegans hypothetical protein F49E2.1 . 

[1929] All these proteins share, in their N-terminal region, a conserved domain that coitains three cysteines. In 
moaA, these cysteines have been shown [1] to be important for the biological activity. They could be inolved in the 
binding of an iron-sulfur cluster. 

[1930] Consensus pattem[LI V]-x(3)-C-[NP]-[LIVMFHQRS]-C.x-[FYM]-C [The three C's are putative Fe-S ligands 

[ 1] Menendez C, Igloi G., Henninger H.. Brandsch R, Arch. Microbiol. 164:142-151(1995). 
[ 2] Hoff T. Schnon- K M., Meyer C, Caboche M. J. Biol. Chem. 270:6100-6107(1995). 

[1931] 796, Forkhead-associated (FHA) domain profile 

[1932] The forkhead-assoclated (FHA) domain [1.E1] is a putative nuclear signalling domain found in a variety of 
otherwise unrelated proteins. The FHA domain comprise approximately 55 to 75 amino acids and contains three highly 
conserved blocks separated by divergent spacer regions. Currently it has been found in the following proteins: 

- Four transcription factors that also contain a forkhead (FH) domain: mouse myocyte nuclear factor 1 (MNF1 ). yeast 
transcription factor FHL1, which probably controls pre-mRNA processing, and yeast FKH1 and FKH2. In* those 
protein the FHA domain is located N-terminal of the DNA-binding FH domain. 

- Kinase-associated protein phosphatase (KAPP) from Arabidopsis thaliana. a protein which specifically interacts 
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with the receptor-type Ser/Thr-kinase RLK5. In KAPP, the FHA domain maps to a region that interacts with the 
receptor-type protein kinase RLK5 only if the kinase is phosphorylated on serine residues [2]. 
Two protein kinases Uom yeast that are involved in mediating the nuclear response to DNA damage: DUN1 and 
SPK1/SAD1 [3]. The latter is the only known protein containing two copies of the FHA domain. 
Protein kinase cdsl from fission yeast contains a FHA domain and might be the ortholog of SPK1. 
Protein kinase MFK1 from east, which is involved in meiotic recombination. 
Human nuclear antigen Ki67 which is expressed only in proliferating cells. 
Yeast hypothetical protein YHRIISc. which contains a RING-finger C-terminal of the FHA domain 
Yeast hypothetical proteins L8083.1 and 9346.10. which contain an extensive coiled<:oil region C-terminal of the 
FHA domain. 



- Caenorhabditis elegans hypothetical protein ZK632.2. 
Caenorhabditis elegans hypothetical protein C01 G6.5. 

- FraH from the prokaryote Anabaena, which contains a zinc-finger motif N-terminal of the FHA domain. 

- An ORF from the bacterium Streptomyces. which is on the opposite strand of the protein kinase pksl overlaoDina 
the ORF of the kinase. ^ 

[ 1] Hofmann K.O., Bucher P Trends Bkxjhem. Sci. 20:347-349(1995). 

[ 2] Stone J.M., Collinge M.A.. Smith R.D., Horn M.A., Walker J.C. Science 266:793-795(1994). 
[ 3] Navas TA, Zhou Z., Elledge S.J. Cell 80:29-39(1995). 

[1933] 797. Ald_Xan_dh_C 

Aldehyde oxidase and xanthine dehydrogenase, C terminus 

[1934] [1] Romao MJ. Archer M, Moura I. Moura JJ, LeGall J, Engh R. Schneider M, Hot P. Huber R; Medline- 
96072968 "Crystal structure of the xanthine oxtdase-related aldehyde oxido-reductase from D. gigas " Science 1995- 
270:1170-1176. 

Number of members: 54 

[1935] 798. Glyco_hydro_38 
Glycosyl hydrolases family 38 

[1936] Glycosyl hydrolases are key enzymes of carbohydrate metabolism. 
Number of members: 20 

[1937] [1] Henrissat 8; Medline: 98313424; Glycosidase families" Biochem Soc Trans 1998-26-153-156 

[1938] 799. HECT 

HECT-donnain (ubiquitin -transferase). 

[1939] The name HECT comes from Homologous to the E6-AP Carboxyl Terminus. 
Number of members: 43 

[1940] [1] Huibregtse JM, Scheffner M, Beaudenon S, Howley PM; Medline: 95223981; A family of proteins struc- 
turally and functionally related to the E6-AP ubiquitin-protein ligase." Proc Natl Acad Sci U S A 1995*92'2563-2567 
[1941] 800. HRDC 
HRDC domain 

[1942] The HRDC (Helicase and RNase D C-terminal) domain has a putative role in nucleic acid binding. Mutations 
in the HRDC domain cause human disease. 

Number of members: 1 9 

[1943] [1 ] Morozov V. Mushegian AR, Koonin EV. Bork P; Medline: 98060076; A putative nuclei acid-binding domain 
in Bloom's and Werner's syndrome helicases" Trends Biochem Sci 1997;22:417-418 
[1944] 801. Integrase 

[1945] Integrase mediates integration of a DNA copy of the viral genome into the host chromosome. Integrase is 
composed of three domains. The amino-terminal domain is a zinc binding domain. The central domain is the catalytic 
domain [IJ.The carboxyl terminal domain is a DNA binding domain [2]. 
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Number of members: 581 



[1946] 

[1] Dyda F. Hickman AB. Jenkins TM. Engelman A. Craigie R, Davies DR; Medline: 95099322. Crystal structure 
of the catalytic domain of HIV-1 integrase: similarity to other polynucleotidyl transferases ■ Science 1994-266- 
1981-1986, 

[2] Lodi PJ. Ernst JA, Kuszewski J, Hickman AB. Engelman A. Craigie R, Cfore GM, Gronenborn AM Medline- 
95359147; Solution structure of the DNA binding domain of HIV-1 integrase." Bkx;hemistry 1995:34:9826-9833 

[1947] 802, lig_chan 
Ligand-gated Ion channel 

[1948] This family Includes the four transmembrane regions of the ionotropic glutamate receptors and NMDA recep- 
tors. 



Number of members: 128 



[1949] [1] Tong G. Shepherd D, Jahr CE; Medline: 95184014; Synaptic desensltization of NMDA receptors bv cal- 
cineurin." Science 1995;267:1510-1512. 
[1950] 803. RhoGAP 
RhoGAP domain 

[1951] GTPase activator proteins towards Rho/Rac/Cdc42-like small GTPases. 
Number of members: 97 



[1952] 



[1] Musacchio A, Cantley LC. Harrison SC; Medline: 97121392; Crystal structure of the breakpoint cluster reglon- 
homology domain from phosphoinosltide 3-kinase p85 alpha subunit." Proc Natl Acad Sci U S A 1996 93- 
14373-14378. 

[2] Barrett T. Xiao B, Dodson EJ. Dodson G, Ludbrook SB. Nurmahomed K. Gamblin SJ. Musacchio A Smerdon 
SJ, Eccleston JF; Medline: 97162209; The structure of the GTPase-activating domain from pSOrhoGAP" Nature 
1997;385:458-461. 

[3] Rittinger K, Walker PA, Eccleston JF, Nurmahomed K. Owen D. Laue E. Gamblin SJ. Smerdon SJ; Medline- 
97404320; Crystal structure of a small G protein in complex with the GTPase-activatIng protein rhoGAR" Nature 
1997;388:693-697. 

[4] Boguski MS, McCormick F; Medline; 94081948; Proteins regulating Ras and its relatives." Nature 1993;366: 
643-654. 



[1953] 804. vwd 

von Willebrand factor type D domain 

[1954] [1] Bork P; Medline: 93327926; The modular architecture of a new family of growth regulators related to 
connective tissue growth factor" FEBS lett 1993;327:125-130. 

Number of members: 92 



[1955] 805. zf-C4_Topoisom 
Topoisomerase DNA binding C4 zinc finger 

[1] Tse-Dlnh YC. Beran-Steed RK; Medline: 89034032; Escherichia coli DNA topoisomerase I is a zinc metallo- 
protein with three repetitive zinc-binding domains." J Biol Chem 1988;263:15857-15859 
[2) Ahumada A. Tse-Dinh YC; Medline: 99011409; The Zn(ll) binding motifs of E. coli DNA topoisomerase I is part 
of a high-affinity DNA binding domain." Biochem Biophys Res Commun 1998;251 :509-514. 

Number of members: 51 



[1956] 806. AIRC 
AIR carboxytese 
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Members of this family catalyse the decartjoxylation of 1-(5i3hosphoribosyl)-5-amino^-lmida2ole-cart)oxylate (AIR) 

This family catalyse the sixth step of de novo purine biosynthesis. Some members of this family contain two copies of 

this domain. Number of members: 35 

[19S7] 807. Bromodomain signature and profile 

PROSITE cross-reference(s): PS00633; BROMODOMAIN_1, PS50014- 

BROMODOMAIN_2 

The bromodomain [1 .2.3] is a consented region of about 70 amino acids found in the following proteins: 

- Higher eukaryotes transcription initiation factor TFIID 250 Kd subunit (TBP-associated factor p250) (gene CCG1 ) 
P250 associated with the TFIID TATA-box binding protein and seems essential for progression of the G1 phase 
of the cell cycle. 

- Human RING3. a protein of unknown functbn encoded in the MHC class II locus. 

- Mammalian CREB-binding protein (CBP), which mediates cAMP-gene regulation by binding specifically to phos- 
phoiylated CREB protein. 

- Drosophila female sterile homeotic protein (gene fsh). required maternally for proper expression of other homeotic 
genes involved in pattern formation, such as Ubx. 

- Drosophila brahma protein (gene bmn), a protein required for the activation of multiple homeotic genes 

- Mammalian homotogs of brahma. In human, three brahma-like proteins are known: SNF2a(hBRM). SNF2b. and 

BRG 1 . 

• Human BS69. a protein that binds to adenovirus E1 Aand inhibits El A transactivatlon - Human peregrin (or Bri 40) 

- Yeast BDF1 [3], a transcription factor Involved in the expression of a broad class of genes including snRNAs 

- Yeast GCN5, a general transcriptional activator operating in concert with certain other DNA-bindinq transcriptional 
activators, such as GCN4, HAP2/3/4 or ADA2. 

- Yeast NPS1/STH1 . involved in G(2) phase control in mitosis, 

- Yeast SNF2/SWI2, which is part of a complex with the SNF5. SNF6, SWI3 and ADR6/SW11 proteins This SWI- 
complex is involved in transcriptional activation. 

- Yeast SPT7, a transcriptional activator of Ty elements and possibly other genes. 
Caenorhabdrtis elegans protein cbp-1 . 

- Yeast hypothetical protein YGR056w 

- Yeast hypothetical protein YKROOSw. 

- Yeast hypothetical protein L9638. 1 . 

[1 958] Some proteins contain a region which, while similar to some extent to a classical bromodomain, diverges from 
It by either lacking part of the domain or because of an Insertion. These proteins are: 

- Mammalian protein HRX (also known as AII-1 or MLL), a protein involved in translocations leading to acute leuke- 
mias and which possibly acts as a transcriptional regulatory factor. HRX contains a region similar to the C- terminal 
half of the bromodomain. 

- Caenorhabditis elegans hypothetical protein 2K783.4. The bromodomain of this protein has a 23 amino^cid in- 
sertion. 

- Yeast protein YTA7. This protein contains a region with significant similarity to the C-terminal half of the bromodo- 
main. As It IS a member of the AAA family (see <PDOC00572>) it is also in a functionally different context. 

[1 959] The above proteins generally contain a single bromodomain, but some of them contain two copies this is the 
caseof BDF1.CGG1.fsh. RING3. YKR008wandL9638.1. ' 
[1960] The exact function of this domain is not yet known but it is thought to be involved in protein-protein interactions 
frfcl T^l^ '"^POrtant for the assembly or activity of multicomponent complexes involved in transcriptional activation 
[1961] The consensus pattern that has been developed spans a major part of the bromodomain; a more sensitive 
detection is available through the use of a profile which spans the whole domain 

References 



[1962] 



[ 1] Haynes S.R., Doolard C, Winston R, Beck S., Trowsdale J.. Dawid I.B. Nucleic Acids Res. 20:2693-2603 
(1 992). 
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[2] Tamkun J.W., Deuring R.. Scott M.R, Kissinger M.. Pattatucci A.M., Kaufman TC. Kennison J A Cell ea- 
se 1-572(1 992). 

[ 3] Tamkun J.W. Curr Opin. Genet. Dev. 5:473^77(1995). 

[1963] 808. (CH) Actinin-type actin-binding domain signatures 
PROSITE cross-reference(s): PS00019; ACTININ_1. PS00020; ACTININ_2 

[1964] Aipha-actinin is a F-actin cross-linking protein which is thought to anchoractin to a variety of intracellular 
structures [1]. The actin-binding domain of alpha-actinin seems to reside in the first 250 residues of the protein A 
similar actin-binding domain has been found in the N-terminal region of many different actin-binding proteins [2.3]: 

In the beta chain of spectrin (or fodrin). 

- In dystrophin, the protein defective in Duchenne muscular dystrophy (DMD) and which may play a role in anchoring 
the cytoskeleton to the plasma membrane. 

- In the slime mold gelation factor (or ABP-1 20). 

- In actin-binding protein ABP-280 (or filamin). a protein that link actin filaments to membrane glycoproteins 

- In fimbnn (or plastin). an actin-bundling protein. Fimbrin differs from the above proteins in that it contains two 
tandem copies of the actin-binding domain and that these copies are located in the C-terminal part of the protein. 

[1965] Two conserved regions were selected as signature patterns for this type of main. The first of this region is 
located at the beginning of the domain, hile the second one is located in the central section and has been shown to 
be essential for the binding of actin. 

[1 966] Consensus pattem[EQ]-x(2)-[ATVl-[FY]-x(2)-W-x-N 

Consensus pattemILIVM]-x- [SGN]-[LIVM]-[DAGHEl-[SAG].x-[DNEAG]-[LIVM]-x.[DEAG].x(4).(LIVM]-x.[LMHSAG|- 
[LIVM]-[LIVMT]-W-x- [LI VM](2) J i J i J 

[ 1] Schleicher M., Andre E.. Harmann A., Noegel A.A. Dev Genet. 9:521-530(1988), 
[ 2] Matsudaira R Trends Biochem. Sci. 1 6:87-92(1 991 ). 
[ 3] Dubreuil R.R. BioEssays 13:219-226(1991). 

[1967] 809. (COX1 ) Heme-copper oxidase subunit I, copper B binding region signature PROSITE cross-reference 
(s): PS00077; COX1 

Heme-copper respiratory oxidases [1 ] are oligomeric integral membrane protein complexes that catalyze the terminal 
step in the respiratory chain: they transfer electrons from cytochrome c or a quinol to oxygen. Some terminal oxidases 
generate a transmembrane proton gradient across the plasma membrane (prokaryotes) or the mitochondrial inner 
membrane (eukaryotes). The enzyme complex consists of 3-4 subunits (prokaryotes) up to 1 3 polypeptides (mammals) 
of which only the catalytic subunit (equivalent to mammalian subunit 1 (CO I)) is found in all heme-copper respiratory 
oxidases. The presence of a bimetallic center (formed by a high-spin heme and copper B) as well as a low-spin heme 
both hgated to six conserved histidine residues near the outer side of four transmembrane spans within CO I is common 
to all family members [2-4]. 

[1968] In contrary to eukaryotes the respiratory chain of prokaryotes is branched to multiple terminal oxidases The 
enzyme complexes vary in heme and copper composition, substrate type and substrate affinity. The different respiratory 
oxidases allow the cells to customize their respiratory systems according a variety of environmental growth conditions 

[1 969] Recently also a component of an anaerobic respiratory chain has been found to contain the copper B binding 
signature of this family: nitric oxide reductase (NOR) exists in denitrifying species of Archae and Eubacteria 
[1970] Enzymes that belong to this family are: 

- Mitochondrial-type cytochrome c oxidase (EC 1 .9.3 1 ) which uses cytochrome c as electron donor The electrons 
are transferred via copper A (Cu(A)) and heme a to the bimetallic center of CO I that is formed by a penta^oor- 
dinated heme a and copper B (Cu(B)). Subunit 1 contains 12 transmembrane regions. Cu(B) is said to be ligated 
to three of the conserved histidine residues within the transmembrane segments 6 and 7. 

- Quinol oxidase from prokaryotes that transfers electrons from a quinol to the binuclear center of polypeptide I This 
category of enzymes includes Escherichia coli cytochrome O temiinal oxidase complex which is a component of 
the aerobic respiratory chain that predominates when cells are grown at high aeration. 

- FixN, the catalytic subunit of a cytochrome c oxidase expressed in nitrogen-fixing bacteroids living in root nodules 
The high affinity for oxygen allows oxidative phosphorylation under tow oxygen concentrations. A similar enzyme 
has been found in other purple bacteria. 

- Nitric oxide reductase (EC 1.7,997) from Pseudomonas stutzeri. NOR reduces nitrate to dinitrogen. It is a het- 
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ercxJimer of norC and the catalytic subunit norB. The latter contains the 6 Invariant histidine residues and 12 trans- 
membrane segments [5]. 

[1971] As a signature pattem the copper-binding region was used. 

[1972] Consensus pattem[YWG]-[LIVFYWrA](2)-[VGS].H-[LNPl-x.V-x(44.47).H-H [The three H's are copper B iiq- 
ands] 

[1973] Notecytochrome bd complexes do not belong to this family. 
[1] 

Garcia-Horsman J.A., Barquera B., Rumbley J., Ma J., Gennis RB. J. BacterioL 176-5587-5600(1994) 
[2] 

Castresana J., Luebben M.. Saraste M., Higgins D.G. EMBO J. 13:2516-2525(1994) 
[3] 

Capaldi R.A.. Malatesta R, Darley-Usmar V.M. 
Biochlm. Biophys. Acta 726:135-148(1983). 
[4] 

Holm L, Saraste M., Wikstrom M. 
EMBO J. 6:2819-2823(1987). 
[5] 

Saraste M., Castresana J, 
FEBS Lett. 341:1-4(1994). 

[1974] 810. (dehydrog_motyb) Eukaryotic molybdopterin oxidoreductases signature PROSITE cross-reference(s)- 
PS00559; MOLYBCXDPTERIN.EUK 

[1975] A number of different eukaryotic oxidoreductases that require and bind a molybdopterin cofactor have been 
shown [1] to share a few regions of sequence similarity. These enzymes are: 

- Xanthine dehydrogenase (EC 1 . 1 . 1 .204), which catalyzes the oxidation of xanthine to uric acid with the concomitant 
reduction of NAD. Structurally, this enzyme of about 1300 amino acids consists of at least three distinct domains: 
an N-terminal 2Fe-2S ferredoxin-like iron-sulfur binding domain (see <PEX)C001 75>). a central FAD/N AD-binding 
domain and a C-terminal Mo-pterin domain. 

- Aldehyde oxidase (EC 1 .2.3.1 ). which catalyzes the oxidation aldehydes Into acids. Aldehyde oxidase is highly 
similar to xanthine dehydrogenase in its sequence and domain structure. 

- Nitrate reductase (EC 1 .6.6. 1 ), which catalyzes the reduction of nitrate to nitrite. Structurally this enzyme of about 
900 amino acids consists of an N-terminal Mo-pterin domain, a central cytochrome b5-type heme-binding domain 
(see <PDOC00170>) and a C-terminal FAD/NAD-binding cytochrome reductase domain. 

- Sulfite oxidase (EC 1 .8.3. 1 ). which catalyzes the oxidation of sulfite to sulfate. Structurally this enzyme of about 
460 amino acids consists of an N-terminal cytochrome b5-binding domain followed by a Mo-pterin domain. 

[1 976] There are a few conserved regions in the sequence of the molybdopterin -binding domain of these enzymes. 
The pattern uses to detect these proteins Is based on one of them. It contains a cysteine residue which could be 
involved In binding the molybdopterin cofactor. 

[1977] Consensus pattem[GA]-x(3)-[KRNQHT]-x(11.14)-[LIVMFYWS]-x(8)-[LIVMF]-x-C-x(2)4DEN^ 

Wootton J.C., Nicolson RE., Cock J.M., Walters D.E., Burke J.F, Doyle 
W.A.. Bray R.C. 

Biochim. Biophys. Acta 1057:157-185(1991). 

811. (DNAJIgase) ATP-dependent DNA ligase signatures 

PROSITE cross-reference(s): PS00697; DN A_LIGASE_A1 . PS00333; DNA_LIGASE_A2 

[1978] DNA ligase (polydeoxyribonucleotlde synthase) Is the enzyme that joins two DNA fragments by catalyzing 
the formation of an intemucleotide ester bond between phosphate and deoxyribose. It is active during DNA replication, 
DNA repair and DNA recombination. There are two forms of DNA ligase: one requires ATP (EC 6.5.1 .1 ). the other NAD 
(EC 6.5.1.2). 

[1979] Eukaryotic, archaebacterial. virus and phage DNA llgases are ATP-dependent During the first step of the 
joining reactbn. the ligase interacts with ATP to form a covalent enzyme-adenylate intermediate. A conserved lysine 
residue is the site of adenylation [1 ,2]. 

[1980] Apart from the active site region, the c^ly conserved region common to all ATP-dependent DNA llgases is 
found [3] in the C-terminal section and contains a consen/ed glutamate as well as four positions with ccxiserved basic 
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residues. 

[1981] Signature patterns were developed for both conserved regions. 

[1982] Consensus pattem[EDQH]-x-K-x.[DN]-G-x-R-(GACIVM] [K is the active site residue] 

[1983] Consensus patternE-G-{LIVMAHLI\^](2)-[KRhx(5.8HYW]4QNEK].x(2,6)-[KR^ 

Sequences known to betang to this class detected by the patternALL, except for archebactertal DNA ligases. 

[1] 

Tomkinson A.E., Totty N.R. Ginsburg M., Lindahl T. 
Proc- Natl. Acad. Sci. U.S.A. 88:400-404(1991) 
[2] 

Lindahl T, Barnes D.E. 

Annu. Rev. Biochem. 61:251-281(1992) 

[3] 

Kletzin A. 

Nucleic Acids Res. 20:5389-5398(1992). 

[1984] 812. (FAD_Gly3P_dh) FAD-dependent glycerol-S-phosphate dehydrogenase signatures PROSITE cross-ref- 
erence(s): PS00977; FAD_G3PDH_1, PS00978; FAD_G3PDH_2 

[1985] FAD-dependent glycerol-3-phosphate dehydrogenase (EC 1 . 1 .99.5) (GPD) catalyzes the conversion of glyc- 
erol-3-phosphate into dihydroxyacetone phosphate. In bacteria [1 ] it is associated with the utilization of glycerol coupled 
to respiration. In Escherichia coli. two isozymes are known: one expressed under anaerobic conditions (gene qlpA) 
and one in aerobic conditions (gene gIpD). In eukaryotes. a mitochondrial form of GPD participates in the glycerol 
phosphate shuttle in conjunction with an NAD-dependent cytoplasmic GPD (EC 1 .1 1 8) [2 3] 
[1986] These enzymes are proteins of about 60 to 70 Kd which contain a probable FAD-binding domain in their N- 
terminal extremity. The mammalian enzyme differs from the bacterial or yeast proteins by having an EF-hand calcium- 
binding region (See <PDOC00018>) in its C-terminal extremity. 

[1987] Two signature patterns were developed. One based on the first half of the FAD-binding domain and one which 
corresponds to a conserved region in the central part of these enzymes. 
[1988] Consensus pattem[IVl.G-G-G-x(2)-G-[STACV]-G-x-A-x-D-x(3)-R-G 
Consensus patternG-G-K-x(2)-[GSTE]-Y-R-x(2)-A 

[1] 

Austin D., Larson T.J. 

J. BacterioL 173:101-107(1991). 

[2] 

Roennow B., Kie Hand-Brandt M.C. 

Yeast 9:1121-1130(1993). 

[3] 

Brown L.J.. McDonald M.J., Lehn D.A.. Moran S.M. 
J. Biol, Chem. 269:14363-14366(1994). 

aon^l (""^Py-^^A-QVco) Formamidopyrimidine-DNA gVcosylase signature PROSITE cross-reference(s): 

PS01242i FPG 

[1990] Formamidopyrimidine-DNA glycosylase (EC 3.2.2.23) [1] (Fapy-DNA glycosylase) (gene fpg) is a bacterial 
enzyme involved in DNA repair and which excise oxidized purine bases to release 2,6-diaminowl-hydroxy-5N-methyl- 
formamidopyrimidine (Fapy) and 7.8-dihydro-8-oxoguanine (8-OxoG) residues. In addition to its gVcosylase activity, 
FPG can also nick DNA at apurinic/apyrimidinic sites (AP sites). FPG is a monomeric protein of about 32 Kd which 
binds and require zinc for its activity. 

[1991] The binding site for zinc seems to be located in the C-terminal part of the enzyme where fours conserved and 
essential 12] cysteines are located. A signature pattern was developed based on this region 

[1992] Consensus pattemC-x(2.4)-C-x-[GTAQl-x-IIV]-x(7)-R-[GSTAN)-[STA]-x-[FYI]-C- x(2)-C-Q 
[The four C's are putative zinc ligands] 

[11 

Duwat P., de Oliveira R.. Ehrlich S.D., Boiteux S. 

Microbiology 141:411-417(1995). 

[2] 

O'Connor TE.. Graves R.J.. Demurcia G., Castaing B.. Laval J. 
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J. Bbl. Chem. 268:9063-9070(1993). 

[1993] 814. (G_glujranspept) Gamma-glutamyltranspeptidase signature PROSITE cross-reference(s)- PS00462- 
G_GLU_TRANSPEPTIDASE ' 

5 [1994] Gamma-glutamyttranspeptidase (EC 2.3.2.2) (GGT) [1] catalyzes the transfer of the gamma-glutamyl moiety 
of glutathione to an acceptor that may be an amino acid, a peptide or water (forming glutamate). GGT plays a key role 
in the gamma-glutamyl cycle, a pathway for the synthesis and degradation of glutathione. In prokaryotes and eukary- 
otes. it is an enzyme that consists of two polypeptide chains, a heavy and a light subunit. processed from a single 
Cham precursor. The active site of GGT is known to be located in the light subunit. 

10 [1995] The sequences of mammalian and bacterial GGT show a number of regions of high similarity [2] Pseu- 
domonas cephalosporin acylases (EC 3.5.1.-) that convert 7-beta-(4<:arboxybutanamido)K:ephalosporanic acid (GL- 
7ACA) into 7-aminocephalosporanic acid (7ACA) and glutaric acid are evolutionary related to GGT and also show 
some GGT activity [3]. Like GGT, these GL-7ACA acylases, are also composed of two subunits. 
[1996] One of the conserved regions correspond to the N-terminal extremity of the mature light chains of these 
enzymes. This region was used as a signature pattern. 

[1997] Consensus pattemT-[STA]-H-x-[ST]-[LIVMA]-x(4)-G-{SN]-x-V-[STA]-x-T-x-T-[LIVM]-[NE]-x(1 .2)-[FY]-G 
[1] 

Tate S.S., Meister A. 
'0 Meth. Enzymol. 1 1 3:400-41 9(1 985) 

[2] 

Suzuki H., Kumagai H., EchigoT, Tochikura T 

J. Bacteriol. 171:5169-5172(1989) 

[3] 

■5 Ishiye M., Niwa M. 

Biochim. Biophys. Acta 1132:233-239(1992). 



[1998] 815. G-protein gamma subunit profile 

PROSITE cross-reference(s): PS50058; G_PROTEIN_GAMMA 

30 [1999] Guanine nucleotide-binding proteins (G proteins) [1] act as intermediaries in the transduction of signals gen- 
erated by transmembrane receptors. G proteins consist of three subunits (alpha, beta, and gamma). The alpha subunit 
binds to and hydrolyzes GTP; the functions of the beta and gamma subunits are less clear but they seem to be required 
for the replacement of GDP by GTP as well as for membrane anchoring and receptor recognition. 
[2000] The gamma subunits are small proteins (from 70 to 110 residues) that are bound to the membrane via a 

3S isoprenyl group (either a famesyl or a geranylgeranyl) covalently linked to their C-terminus. In mammals there are at 
least 12 different isoforms of gamma subunits. 

[2001] The Caenorhabditis elegans protein egl-10, which is a regulator of G-protein signalling, contains a G-protein 
gamma-like domain. 

[2002] A profile was developed that spans the complete length of the gamma subunit 

40 [1] 

Pennington S.R. 
Protein Prof. 2:16-315(1995). 
[2003] 816. GNS1/SUR4 family signature 
PROSITE cross-reference(s): PS01188; GNS1_SUR4 
45 [2004] The following group of eukaryotic integral membrane proteins, whose exact function has not yet clearly been 
established, are evolutionary related [1]: 

- Yeast GNS1 [2]. a protein involved in synthesis of 1 .3-beta-glucan. 

- Yeast SUR4 (or APA1 . SRE1 ) [3]. a protein that could act in a glucose-slgnaling pathway that controls the expres- 
^ sion of several genes that are transcriptionally regulated by glucose. 

- Yeast hypothetical protein YJL1 96c. 

- Caenorhabditis elegans hypothetical protein C40H1 .4. 

- Caenortiabditis elegans hypothetical protein D2024.3. 



ss [2005] The proteins have from 290 to 435 amino acid residues. Structurally, they seem to be formed of three sectbns" 
a N-terminal regbn with two transmembrane domains, a central hydrophilic loop and a C-terminal region that contains 
from one to three transmembrane domains. A conserved region that contains three histidines was selected as a sig- 
nature pattern. This region is located in the hydrophilic loop. 
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Consensus patternL-x-F-L-H-x-Y-H-H 



[1] 

Bairoch A. 

Unpublished observations (1996). 
[2] 

El-Sherbeini M., Clemas J A 

J. Bacteriol. 177:3227-3234(1995) 

[3] 

Garcia-Arranz M., Maldonado A.M., Mazon M.J., Portillo F. 
J. Biol. Chem. 269:18076-18082(1994). 

[2006] 81 7. Immunoglobulins and major histocompatibility complex proteins signature PROSITE cross.reference(s): 
PS00290, IG MHC 

[2007] The basic structure of immunoglobulin (Ig) [1 ] molecules is a tetramer of two light chains and two heavy chains 
linked by disulfide bonds. There are two types of light chains: kappa and lambda, each composed of a constant domain 
(CL) and a variable domain (VL). There are five types of heavy chains: alpha, delta, epsilon, gamma and mu all 
consisting of a variable domain (VH) and three (in alpha, delta and gamma) or four (in epsilon and mu) constant domains 
(CH1toCH4). 

[2008] The major histocompatibility complex (MHC) molecules are made of two chains. In class I [2] the alpha chain 
IS composed of three extracellular domains, a transmembrane region and a cytoplasmic tail. The beta chain (beta- 
2-microglobulin) is composed of a single extracellular domain. In class II [3], both the alpha and the beta chains are 
composed of two extracellular domains, a transmembrane region and a cytoplasmic tail. 

[2009] It is known [4,5] that the Ig constant chain domains and a single extracellular domain in each type of MHC 
chains are related. These homologous domains are approximately one hundred amino acids tong and include a con- 
sented intradomain disulfide bond. A small pattem around the C-temninal cysteine is involved in this disulfide bond 
which can be used to detect these category of Ig related proteins. 

[2010] Consensus pattem[FY]-x-C-x-(VA]-x-H-Sequences known to belong to this class detected by the pattem- Ig 
heavy chains type Alpha C region : All, in CH2 and CH3. Ig heavy chains type Delta C region : All, in CHS Ig heavy 
chains type Epsilon C region: All, in CHI, CH3andCH4. Ig heavy chains type Gamma C region : All, in CH3 and also 
CHI in some cases Ig heavy chains type Mu C region : All. in CH2, CH3 and CH4. Ig light chains type Kappa C region ■ 
In all CL except rabbit and Xenopus. Ig light chains type Lambda C region : In all CL except rabbit. MHC class I alpha 
chains : 

All, in alpha-3 domains, including in the cytomegalovirus MHC-1 homologous protein [6]. Beta-2-microglobulin • All 
MHC class II alpha chains: All, in alpha-2 domains. MHC class II beta chains: All, in beta-2 domains. 

[1] 

Gough N. 

Trends Biochem. Sci. 6:203-205(1981) 
[2] 

Klein J., Figueroa F. 
Immunol. Today 7:41-44(1986) 
[3] 

Figueroa R, Klein J. 
Immunol. Today 7:78-81(1986). 
[4] 

Orr H.T, Lancet D., Robb R.J., Lopez de Castro J.A., Strominger J.L. 

Nature 282-266-270(1979) 

[5] 

Cushley W., Owen M.J. 
Immunol. Today 4:88-92(1983) 
[6] 

Beck S., Barrel B.G. 
Nature 331:269-272(1988). 

[2011] 818. (IGFBP) Insulin-like growth factor binding proteins signature PROSITE cross-reference(s): PS00222; 
IGF_BINDING ' 

[2012] The insulin-like growth factors (IGF-I and IGF-II) bind to specific binding proteins in extracellular fluids with 
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high affinity [1 .2.3). These IGF-binding proteins (IGFBP) prolong the half-life of the IGFs and have been shown to either 
inhibit or stimulate the growth promoting effects of the IGFs on cells culture. They seem to alter the interaction of IGFs 
with their cell surface receptors. There are at least six different IGFBPs and they are structurally related. 
[201 3] The following growth-factor inducible proteins are structurally related to IGFBPs and could f unctbn as growth- 
factor binding proteins [4,5]; 

Mouse protein cyr61 and its probable chicken homolog, protein CEF-10. 

- Human connective tissue growth factor (CTGF) and its mouse homolog, protein FISP-1 2. 

- Vertebrate protein NOV. 

[2014] As a signature pattem a conserved cysteine-rich region bcatedin the N-terminal section of these proteins is 
used. ^ 

[2015] Consensus pattem G-C-[GS]-C-C-x(2)-C-A-x(6)-C 

Sequences known to bekjng to this class detected by the patternALL, except for IGFBP-6's. 
[1] 

Rechler M.M, 

Vitam. Horm. 47:1-114(1993). 
[2] 

Shimasaki S., Ling N. 

Prog. Growth Factor Res. 3:243-266(1991) 
[3] 

Clemmons D.R. 

Trends Endocrinol. Metab. 1:412-417(1990) 
[4] 

Bradham D M., Igarashi A., Potter R.L., Grotendorst G.R. 

J. Cell Biol. 114:1285-1294(1991). 

[5] 

Maloisel V. Martinerie C. Dambrine G., Plassiart G., Brisac M., Crochet 
J.. Perbal B. 

Mol. Cell. Biol. 12:10-21(1992). 



[2016] 819. LMWPc: Low molecular weight phosphoty rosin e protein phosphatase 
Number of members: 34 

[2017] [1]Medline: 94329182. The crystal structure of a low-molecular-weight phosphotyrosine protein phosphatase 
Su XD, Taddei N, Stefani M, Ramponi G, Nordlund P; Nature 1994;370:575-578. 
[2018] 820. (myosin_head) ATP/GTP-binding site motif A (P-loop) 
PROSITE cross-reference(s): PS00017; ATP_GTP_A 

[2019] From sequence comparisons and crystallographic data analysis it has been shown [1 .2,3,4 5 6] that an ap- 
preciable proportion of proteins that bind ATP or GTP share a number of more or less conserved sequence motifs 
The best conserved of these motifs is a glycine-rich region, which typically forms a flexible loop between a beta-strand 
and an alpha-helix. This loop interacts with one of the phosphate groups of the nucleotide. This sequence motif is 
generally referred to as the 'A' consensus sequence [1] or the 'P-loop' [5]. 

[2020] There are numerous ATP- or GTP-binding proteins in which the P-loop is found. A number of protein families 
for which the relevance of the presence of such motif has been noted is listed below: 

ATP synthase alpha and beta subunits (see <PDOC00137>). 
Myosin heavy chains. 

Kinesin heavy chains and kinesin-like proteins (see <PDOC00343>). 
Dynamins and dynamin-like proteins (see <PDOCG0362>). 
Guanylate kinase (see <PDOC00670>). 
Thymidine kinase (see <PDOC00524>). 
Thymidylate kinase (see <PDOC01034>). 
Shikimate kinase (see <PDOC00868>). 

Nitrogenase iron protein family (nrfH/frxC) (see <PDOC00580>). 

ATP-bindIng proteins involved in 'active transport* (ABC transporters) [7] (see <PDOC00185>) 
DNA and RNA helicases [8,9,10]. 
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GTP-binding elcaigation factors (EF-Tu. EF-lalpha, EF-G. EF-2, etc.). 
Ras family of GTP-binding proteins (Flas, Rho, Rab. Ral, Yptl. SEC4. etc.). 
Nuclear protein ran (see <PDOC00859>). 
ADP-ribosylation factors family (see <PDOC00781>). 
Bacterial dnaA protein (see <PDOC00771>). 
Bacterial recA protein (see <PDOC00131>), 
Bacterial recF protein (see <PDOC00539>). 

Guanine nucleotide-binding proteins alpha subunits (Gi, Gs. Gt. GO. etc.). 
DNA mismatch repair proteins mutS family (See <PDOG00388>). 
Bacterial type II secretion system protein E (see <PDOC00567>). 

[2021] Not all ATP- or GTP-binding proteins are picked-up by this motif. A number of proteins escape detection 
because the structure of their ATP-binding site is completely different from that of the P-loop. Examples of such proteins 
are the E1-E2 ATPases or the glycolytic kinases. In other ATP- or GTP-binding proteins the flexible loop exists in a 
slightly different form; this is the case for tubulins or protein kinases. A special mention must be resen/ed for adenylate 
kinase, in which there is a single deviation from the P-loop pattern: in the last position Gly is found instead of Ser or Thr 
[2022] Consensus pattem[AG]-x(4)-G-K-[ST] 

[1] 

Walker J.E., Saraste M.. Runswick M.J., Gay N.J. 

EMBO J. 1:945-951(1982). 

[2] 

Moller W., Amons R. 
FEBS Lett. 186:1-7(1985). 
[3] 

Fry D.C. Kuby S.A.. Mildvan A.S. 

Proc. Natl. Acad. Sci. U.S.A. 83:907-911(1986) 

[4] 

Dever TE., Glynias M.J., Merrick W.C. 

Proc. Natl. Acad. Sci. U.S.A. 84:1814-1818(1987) 

[5] 

Saraste M., Sibbald PR., Wittinghofer A. 
Trends Biochem. Sci. 15:430-434(1990) 
[6] 

Koonin E.V. 

J. Mol. Biol. 229:1165-1174(1993). 
[7] 

Higgins C.R, Hyde S.C.. Mimmack M.M., Gileadi U., Gill D.R.. Gallagher M.R 

J. Bioenerg. Biomembr. 22:571-592(1990) 

[8] 

Hodgman TC. 

Nature 333:22-23(1988) and Nature 333:578-578(1988) (Errata) 
[9] 

Under P., Lasko R, Ashbumer M., Leroy P., Nielsen PJ., Nishi K., 
Schnier J., Slonimski P.P 
Nature 337:121-122(1989). 
[10] 

Gorbalenya A.E., Koonin E.V.. Donchenko A.P, Blinov VM. 
Nucleic Acids Res. 17:4713-4730(1989). 

[2023] 821. PE: PE family 

This family named after a PE motif near to the amino terminus of the donr^in. The PE family of proteins all contain an 
ammo-terminal region of about 110 amino acids. The carboxyl terminus of this family are variable and fall into several 
classes. The largest class of PE proteins is the highly repetitive PGRS class which have a high glycine content The 
function of these proteins is uncertain but it has been suggested that they may be related to antigenic variation of 
Mycobacterium tuberculosis [1). Number of members: 88 

[2024] [1 ] Medline: 98295987. Deciphering the biology of Mycobacterium tuberculosis from the complete genome 
sequence. Cole ST. Brosch R. Parkhill J. Gamier T Churcher C, Harris D, Gordon SV. Eiglmeier K. Gas S. Barry CE 
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3rd. Tekaia F. Badcock K, Basham D. Brown D, Chillingworth T, Connor R. Davies R. Devlin K. Feltwell T Gentles S 
Hamlin N, Holroyd S, Homsby T Jagels K, Barrel! BG. et al; Nature 1998;393:537-544 
[2025] 822. (RNB) Ribonuclease II family signature 
PROSITE cross-reference(s): PS01175: RIBONUCLEASEJI 

[2026] On the basis of sequence similarities, the following bacterial and eukaryotic proteins seem to form a family: 

- Escherichia coli and related bacteria ribonuclease II (EC 3.1.13.1) (RNase II) (gene mb) [1). RNase II is an exo- 
nuclease involved in mRNA decay It degrades mRNA by hydrofyzing single-stranded polyribonucleotides proces- 
sively in the 3' to 5' direction. 

- Bacterial protein vacB. In Shigella flexneri. vacB has been shown to be required for the expression of virulence 
genes at the posttranscriptional level. 

- Yeast protein SSD1 (or SRK1 ) which Is implicated in the control of the cell cycle G1 phase. 

- Yeast protein DIS3 [2], which binds to ran (GSP1 ) and ehances the the nucleotide-releasing activity of RCC1 on ran 

- Fission yeast protein dis3, which is implicated in mitotic control. 

- Neurospora crassa cyt-4. a mitochondrial protein required for RNA 5' and 3' end processing and splicing. 

- Yeast protein MSU1 . which is involved in mitochondrial biogenesis. 

- Synechocystis strain PCC 6803 protein zam [3], which control resistance to the carbonic anhydrase inhibitor aceta- 
zolamide. 

Caenortiabditis elegans hypothetical protein F48E8.6. 

[2027] The size of these proteins range from 644 residues (mb) to 1250 (SSD1). While their sequence is highV 
drvergent they share a conserved domain in their C-terminal section [4]. It is possible that this dormin plays a role in 
a putative exonuclease function that would be common to all these proteins. A signature pattern was developed based 
on the core of this conserved domain. 

[F^?Dx(3r[H^^^^ P^«®^"f^^'l-fFYE].[GSTAM].[LIVM]-x(4.5)-Y-[STAL]-x.[FWVAC]-[TV^ 
[1] 

Zilhao R., Camelo L., Arraiano CM. 
Mol. Microbiol. 8:43-51(1993) 
[2] 

Noguchi E.. Hayashi N., Azuma Y. Seki T, Nakamura M., Nakashima N., 
Yanagida M., He X., Mueller U., Sazer S., Nishimoto T 
EMBO J. 15:5595-5605(1996). 
[3] 

Beuf L., Bedu S.. Cami B.. Joset F 
Plant Mol. Biol. 27:779-788(1995). 
[4] 

Mian I.S. 

Nucleic Acids Res. 25:3187-3195(1997). 

[2029] 823, Src homology 2 (SH2) domain profile 
PROSITE cross-reference(s): PS50001; SH2 

[2030] The Src homology 2 (SH2) domain is a protein domain of about 1 00 amino^cid residues first identified as a 
conserved sequence region between the oncoproteins Src and Fps [1]. Similar sequences were later found in many 
other intracellular signal-transducing proteins [2]. SH2 domains function as regulatory modules of intracellular signalling 
cascades by interacting with high affinity to phosphotyrosine^ntaining target peptides in a sequence-specific and 
strictly phosphorylation-dependent manner [3,4.5,6]. 

[2031] The SH2 domain has a conserved 3D structure consisting of two alpha helices and six to seven beta-strands 
The core of the domain is formed by a continuous beta-meander composed of two connected beta-sheets [7] 
[2032] So far, SH2 domains have been identified in the following proteins: 

- Many vertebrate, invertebrate and retroviral cytoplasmic (non-receptor) protein tyrosine kinases. In particular in 
the Src, Abl. Bkt. Csk and ZAP70 families of kinases. 

- Mammalian phosphatidylinosrtol-specific phospholipase C gamnna-1 and -2. Two copies of the SH2 domain are 
found in those proteins in between the catalytic 'X-' and 'Y-boxes'(see <PDOC50007>). 

- Mammalian phosphatidyl inositol 3-kinase regulatory p85 subunit. 

- Some vertebrate and invertebrate protein-tyrosine phosphatases. 
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- Mammalian Ras GTPase-activating protein (GAP). 

- Adaptor proteins mediating binding of guanine nucleotide exchange factors to growth factor receptors: vertebrate 
GRB2, Caenorhabditis elegans sem-5 and Drosophiia DRK. 

- Mammalian Vav oncoprotein, a guanine-nucleotide exchange factor of the CDC24 family. 

- Miscellanous proteins interacting with vertebrate receptor protein tyrosine kinases: oncoprotein Crk, mammalian 
cytoplasmic proteins Nek. She. 

STAT proteins (signal transducers and activators of transcription). 
Chicken tensin, 

- Yeast transcriptional control protein SPT6. 

[2033] The profile developed to detect SH2 domains is based on a structural alignment consisting of 8 gap-free 
blocks and 7 linker regions totaling 92 match positions. 

[1] 

SadowskI I., Stone J.C., Pawson T. 
Mol. Cell. Biol. 6:4396-4408(1986). 
[2] 

Russel R.B., Breed J., Barton G.J. 
FEBS Lett. 304:15-20(1992). 
[3] 

Marangere L.E.M., Pawson T 

J. CellSci. Suppl. 18:97-104(1994). 

[4] 

Pawson T. Schlessinger J. 
Curr. Biol. 3:434-442(1993). 
[5] 

Mayer B.J., Baltimore D. 
Trends Cell. Biol. 3:8-13(1993). 
[6] 

Pawson T 

Nature 373:573-580(1995). 
17] 

Kuriyan J., Cowburn D. 

Curr. Opin. Struct. Biol. 3:828-837(1993). 

[2034] 824. Sulfate transporters signature 

PROSITE cross-reference(s): PS01130; SULFATE_TRANSP 

[2035] A number of proteins involved in the transport of sulfate across a membrane as well as some yet uncharac- 
terlzed proteins have been shown [1,2] to be evolutionary related. These proteins are: 

Neurospora crassa sulfate permease II (gene cys-14). 

- Yeast sulfate permeases (genes SUL1 and SUL2). 

- Rat sulfate anion transporter 1 (SAT-1 ). 

- Mammalian DTDST a probable sulfate transporter which, in Human, is involved in the genetic disease, diastrophic 
dysplasia (DTD). 

- Sulfate transporters 1 , 2 and 3 from the legume Stylosanthes hamata. 

- Human pendrin (gene PDS), which is involved in a number of hearing loss genetic diseases. 
Human protein DRA (Down-Regulated in Adenoma). 

- Soybean early nodulin 70. 
Escherichia coli hypothetical protein ychM. 
Caenorhabditis elegans hypothetical protein F41 D9.5. 

[203SI As expected by their transport function, these proteins are highly hydrophobic and seem to contain about 1 2 
transmembrane domains. The best conserved regioi seems to be located in the second transmembrane region and 
is used as a signature pattern. 

[2037] Consensus pattem[PAV]-x-Y.IGS]-L-Y-[STAG](2)-x(4)-[LIVFYA]-[LIVST].[YIJ-x(3).(GA].[GST]-S.[KR] 
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[11 

Sandal N.N.. Marcker KA 

Trends Biochem. Sci. 19:19-19(1994). 

[2] 

Snnith F.W., Hawkesford M.J., Prosser I.M., Clarkson D.T. 
Mol. Gen. Genet. 247:709-715(1995). 

[2038] 825. TYA: TYA transposon protein 

Ty are yeast transposons. A 5.7kb transcript codes for p3 a f usbn protein of TYA and TYB. The TYA protein is analogous 
to the gag protein of retroviruses. TYA a Is cleaved to form 46kd protein which can form mature virion like particles f 1 1 
Number of members: 59 r i j. 

[2039] [1] Medline: 97404699. Cryo-electron microscopy structure of yeast Ty retrotransposon virus-like particles 
Palmer KJ. Tlchelaar W. Myers N, Bums NR. Butcher SJ, Kingsman AJ, Fuller SD. Saibil HR; J Virol 1997;71: 
6863-6868. * ' 

[2040] 826. AldolaseJI 
Class II Aldolase and Adducin N-terminal domain. 

This family includes class II aldolases and adducins which have not been ascribed any enzymatic function Number 
of members: 37 

References: 

[2041] 

[1 ] Medline: 9329481 9. The spatial structure of the class II L-fu cu lose- 1 -phosphate aldolase from Escherichia coli 
Dreyer MK, Schuiz GE; J Mol Biol 1 993;231 :549-553. 

[2] Medline: 96256522. Catalytic mechanism of the metal -dependent fuculose aldolase from Escherichia coll as 
derived from the structure. Dreyer MK. Schuiz GE; J Mol Biol 1996;259:458-466. 

[2042] 827. CBD_2 

-!- Two tryptophan residues are involved in cellulose binding. 

-!- Cellulose binding domain found in bacteria. Number of members: 51 

References: 

[2043] [1] Medline: 95284032. Solution structure of a cellulose-binding domain from Cellulomonas fimi by nuclear 
magnetic resonance spectroscopy. Xu GY, Ong E. Gilkes NR. Kilbum DG. Muhandiram DR. Harris-Brandts M Carver 
JP, Kay LE, Harvey TS; Biochemistry 1995;34:6993-7009 
[2044] 828. P 

A unique feature of the eukaryotic subtilisin-like proprotein convertases is the presence of an additional highly con- 
served sequence of approximately 150 residues (P domain) located immediately downstream of the catalytic domain 
Number of members: 91 

References: 

[2045] 

[1] Medline: 9425231 4. A C-terminal ctomain conserved in precursor processing proteases is required for Intramo- 
lecular N-terminal maturation of pro-Kex2 protease. Gluschankot P. Fuller RS; EMBO J 1994-13-2280-2288 
[2] Medline: 98225190. Regulatory roles of the P domain of the sub^ilisin^ike prohormone convertases Zhou A. 
Martin S. Lipkind G, LaMendola J. Steiner DF; J Biol Chem 1998;273:11107-11114. 

[2046] 829. Uncharacterized protein family UPF0020 signature 
PROSITE cross-reference(s): PS01261; UPF0020 

The following uncharacterized proteins have been shown [1] to share regions of similarities: 

- Escherichia coli hypothetical protein ycbY and HI0116/15. the corresponding Haemophilus influenzae protein. 
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- Bacillus subtilis hypothetical protein ypsC. 

- Synechocystis strain PCC 6803 hypothetical protein slr0064. 

- Methanococcus jannaschii hypothetical proteins MJ0438 and MJ071 0. 

[2047] These are hydrophilic proteins of from 40 Kd to about 80 Kd. They can be picked up in the database by the 
following pattern. ' 

[2048] Consensus pattemD-P-[LIVMF]-C-G-[ST]-G-x(3HLI)-E 
References: 

[2049] [ 1 ] Balroch A. Unpublished observations (1 997). 
[2050] 830. Uncharacterized protein family UPF0031 signatures 

PROSITE cross-reference(s): PS01049; UPF0031_1: PS01050: UPF0031_2 The following uncharacterized proteins 
have been shown [1] to share regbns of similarities: 



Yeast chromosome XI hypothetical protein YKL151c. 
Caenorhabditis elegans hypothetical protein R 107.2. 
Escherichia coli hypothetical protein yjeF. 
Bacillus subtilis hypothetical protein yxkO. 
Helicobacter pylori hypothetical protein HP 1363. 
Mycobacterium tuberculosis hypothetical protein MtCY77.05c. 
Mycobacterium leprae hypothetical protein B229_C2_201. 
Synechocystis strain PCC 6803 hypothetical protein sill 433. 
Methanococcus jannaschii hypothetical protein MJ1586. 



[2051] These are proteins of about 30 to 40 Kd whose central region is well conserved. They can be picked up in 
the database by the following patterns. 

[2052] Consensus panem[SAV]-[IVW]-[LVAHLIV]-G-[PNS].G-L-[GP]-x-[DENQT] 
Consensus pattern[GA]-G-x-G-D-[TV]-[LT]-[STA]-G-x-[LIVM] 
[2053] 831.(ACOX) 
Acyl-CoA oxidase 

[2054] This is a family of Acyl-CoA oxidases EC: 1 .3.3.6. Acyl-coA oxidase converts acyl-CoA into trans-2-enoyl-CoA 



Number of members: 39 



[2055] [1] Hayashi H, De Bellis L. Yamaguchi K. Kato A. Hayashi M, Nishimura M; Medline: 98192624 Molecular 
charactenzation of a glyoxysomal long chain acyl-CoA oxidase that is synthesized as a precursor of higher molecular 
mass in pumpkin." J Biol Chem 1 998;273:8301 -8307. 
[2056] 832. (AlCARFTJMPCHas) 
AlCARFT/IMPCHase bienzyme 

[2057] This is a family of bifunctional enzymes catalysing the last steps in de novo purine biosynthesis The bifunc- 
tional enzyme is found in both prokaryotes and eukaryotes. The second last step is cata^sed by S-aminoimidazole- 
4K;arboxamide ribonucleotide fomiyltransferase EC:2.1 .2.3 (AlCARFT). this enzyme catalyses the formylation of AIC- 
AR with 1 0-fomiyl-tetrahydrofolate to yield FAICAR and tetrahydrofolate [1 ]. The last step is catatysed by IMP (Inosine 
monophosphate) cyclohydrolase EC:3.5.4.10 (IMPCHase), cyclizing FAICAR (5.formylamlnoimidazole-4^rboxamide 
ribonucleotide) to IMP [1]. 



Number of members: 22 



[2058] 



[1] Akira T Komatsu M. Nango R. Tomooka A. Konaka K, Yannauchi M, Kitamura Y, Nomura S, Tsukamoto I 
Medline: 97473523 Molecular cloning and expression of a rat cDNA encoding 5-aminoimidazole^K:arboxamide 
ribonucleotide formyltransf erase/I MP cyclohydrolase' [published erratum appears in Gene 1998 Feb 27-208r2V 
337] Gene 1 997; 1 97:289-293. ' ^ 

[2] Rayl EA. Moroson BA. Beardsley GP; Medline: 96147205 The human purH gene product. 5-aminoimidazole- 
4-carboxamide ribonucleotide fonmyltransferase/IMP cycbhydrolase. Cloning, sequencing, expression, purifica- 
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tion, kinetic analysis, and domain mapping.* J Biol Chem 1996;271:2225-2233. 

[2059] 833. (AOX) 
Alternative oxidase 

[2060] The alternative oxidase is used as a second temninal oxidase in the mitochondria, electrons are transfered 
directly from reduced ubiquinol to oxygen forming water [2], This is not coupled to ATP synthesis and is not inhibited 
by cyanide, this pathway is a single step process [1 ]. In rice the transcript levels of the alternative oxidase are increased 
by low temperature [1]. 



Number of members: 27 



[2081] 



[1] Ito Y. Saisho D, Nakazono M. Tsutsumi N, Hirai A; Medline: 98086211 Transcript levels of tandem-arranged 
alternative oxidase genes in rice are increased by low temperature." Gene 1997;203:121-129. 

[2] Li Q, Ritzel RG, McLean LL, Mcintosh L, Ko T. Bertrand H, Nargang FE; Medline: 9636641 3 Cloning and analysis 
of the alternative oxidase gene of Neurospora crassa.' Genetics 1996;142:129-140. 

[2062] 834. (APH) 

Protein kinases signatures and profile 

[2063] Cross-reference(s): PS00107; PROTEIN_KINASE_ATR PSC0108- 
PROTEIN_KINASE_ST, PS001 09; PROTEIN_KINASE_TYR, PS50011 • 
PROTEIN_KINASE_DOM 

[2064] Eukaryotic protein kinases [1 to 5] are enzymes that belong to a very extensive family of proteins which share 
a conserved catalytic core common to both serineAhreonine and tyrosine protein kinases. There are a number of 
consented regions in the catalytic domain of protein kinases. Two of these regrans have been selected to build signature 
patterns. The first region, which is located in the N-terminal extremity of the catalytic domain, is a glycine-rich stretch 
of residues in the vicinity of a lysine residue, which has been shown to be involved in ATP binding. The second region 
which IS located in the central part of the catalytic domain, contains a consented aspartic acid residue which is important 
for the catalytic activity of the enzyme [6]; two signature patterns were derived for that region: one specific for serine/ 
threonine kinases and the other for tyrosine kinases. A profile was devetoped which is based on the alignment in fll 
and covers the entire catalytic domain. 

[2065] Consensus pattern: [LIV]-G-{P)-G-{P}-[FYWMGSTNH]-[SGA]:{PW}-[LIVCAT]-{PD}-x- [GSTACLIVMFYl-x 
(5.18HLIVMFYWCSTAR]-[AIVP|-[LIVMFAGCKR]-K[K binds ATP] l^t>iAOLiVMhYJ x 

[2066] Sequences known to belong to this class detected by the pattern the majority of known protein kinases but it 
fails to ftnd a number of them, especially viral kinases which are quite divergent in this region and are completely 
missed by this pattern. ' 

SSS P^«^"^- ["-IVMFYCl-x-[HY]-x-D-ILIVMFY]-K-x{2)-N-[LIVMFYCT](3) [D is an active site residue] 

[2068] Sequences known to belong to this class detected by the pattern. Most serine/ threonine specific protein 
kinases with 10 exceptions (half of them viral kinases) and also Epstein-Barr virus BGLF4 and Drosophila ninaC which 
have respectively Ser and Arg instead of the conserved Lys and which are therefore detected by the tyrosine kinase 
specific pattern described below. 

[2069] Consensus pattem: [LIVMFYC]-x-IHY]-x-D-(LIVMFY]-[RSTAC]-x(2)-N-[LIVMFYC](3) [D is an active site res- 
idue] tyrosine specific protein kinases with the exception of human ERBB3 and mouse bik. This pattem will also detect 
most bactenal aminoglycoside phosphotransferases [8,9] and heipesviruses ganciclovir kinases flO]- which are pro- 
teins structurally and evolutionary related to protein kinases. Sequences known to belong to this class detected by the 
profile ALL, except for three viral kinases. This profile also detects receptor guanylate cyclases (see <PDOC00430>) 
and 2-5A-dependent ribonucleases. Sequence similarities between these twofamilies and the eukaryotic protein kinase 
amily have been noticed before. It also detects Anabidopsis thaliana kinase- like protein TMKL1 which seems to have 
lost Its catalytic activity. 

[2070] Note if aproteinanalyzedincludesthetwoprotein kinase signatures, the probability of it being a protein kinase 
IS ctose to 100%. Note eukaiyotic-type protein kinases have also been found in prokaryotes such as Myxococcus 
xanthus [1 1 J and Yersinia pseudotuberculosis. Note the patterns shown above has been updated since their publication 
in [7J. Note this documentation entry is linked to both signature patterns and a profile. As the profile is much more 
sensitive than the patterns, you should use it if you have access to the necessary software tools to do so 
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[2072] 835. (Asp_Glu_race) 

Aspartate and glutamate racemases signatures 

[2073] Cross-reference(s) PS00923; ASP_GLU_RACEMASE_1 PS00924 
ASP_GLU_RACEMASE_2 

[2074] Aspartate racemase (EC 5. 1 . 1 , 1 3) and glutamate racemase (EC 5. 1 . 1 . 3) are two evolutionary related bacterial 

enzymes that do not seem to require a cofactor for their activrty [1 ]. Glutamate racemase, which interconverts L^luta- 

mate into D-glutamate, is required for the biosynthesis of peptidoglycan and some peptide-based antibiotics such as 

gramicidin S. In addition to characterized aspartate and glutamate racemases. this family also includes a hypothetical 

protein from Erwinia carotovora and one from Escherichia coli (ygeA). Two conserved cysteines are present in the 

sequence of these enzymes. They are expected to play a role in catalytic activity by acting as bases in proton abstraction 

from the substrate. Signature patterns were developed for both cysteines. 

[2075] Consensus pattern: [I VA]-[LIVM]-x-C-x(0, 1 )-N-[ST]-[MSA]-[STH]-[LI VFYSTANK] 

Consensus pattern: [LIVM](2)-x-[AG]-C-T-[DEH]-[LIVMFY]-[PNGRS]-x-[LIVM] 

[2076] [ 1] Gallo K.A., Knowles J.R., Biochemistry 32:3981-3990(1993) 

[2077] 836. (ATP-sulfurylase) 

ATP-sulfurylase 

[2078] This family consists of ATP-sulfurylase or sulfate adenylyftransferase EC:2.7.7.4 some of which are part of a 
bifunctional polypeptide chain associated with adenosyl phosphosulphate (APS) kinase APS.kinase. Both enzymes 
are required for PAPS (phosphoadenosine-phosphosulfate) synthesis from inorganic sulphate [2]. ATP sulfurylase 
catalyses the synthesis of adenosine-phosphosuffate APS from ATP and inorganic sulphate [1]. 

Number of members: 37 
[2079] 

[1] Kurima K. Warman ML. Krishnan S. Domowicz M. Krueger RC Jr. Deyrup A, Schwartz NB; Medline* 98337975 
A member of a family of sulfate-activating enzymes causes murine brachymorphism' [published erratum appears 
in Proc Natl Acad Sci U S A 1998 Sep 29;95(20): 12071] Proc Natl Acad Sci U S A 1998;95:8681-8685 
[2] Rosenthal E, Leustek T; Medline: 96096529 A multifunctional Urechis caupo protein. PAPS synthetase has 
both ATP sulfurylase and APS kinase activities." Gene 1 995; 1 65:243-248. 

[2080] 837. (ATP-synt_F) 
ATP synthase (F/14-kDa) subunit 

[2081] This family includes U-kDa subunit from vATPases [1 ]. which is in the peripheral catalytic part of the complex 
[2]. The family also includes archaebacterial ATP synthase subunit F [3]. 

Number of members: 23 

[2(^2] 

[1]Guo Y Kaiser K. WieczorekH. DowJA; Medline: 96269411 The Drosophila melanogaster gene vhal4 encoding 
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a 14-kDa F-subunrt of the vacuolar ATPase." Gene 1996;172:239-243. 

[2] Peng SB. Crider BP. Tsai SJ. Xie XS. Stone DK; Medline: 9621 641 6 Identification of a 1 4-kDa subunit associated 

with the catalytic sector of clathrln-coated vesicle H+^ATPase." J Biol Chem 1996;271:3324-3327 

[3] Wilms R. Freiberg C, Wegerie E, Meier I, Mayer F, Muller V; Medline: 9632496S Subunit structure and organ- 

ization of the genes of the A1 AO ATPase from the Archaeon Methanosarcina mazei Gol " J Biol Chem 1996 271 • 

18843-18852. 

[2083] 838. (CBD_4) 
Starch binding domain 

Number of members: 48 

[2084] 839. (CbiX) 

[2085] The function of CbiX is uncertain, however ft is found in cobalamin biosynthesis operons and so may have a 
related function. Some CbiX proteins contain a striking histidine-rich regbn at their C-terminus. which suggests that ft 
might be involved in metal chelation [1]. 

Number of members: 6 

[2086] [1] Raux E. Lanois A, \Afarren MJ. Rambach A. Thermes C; Medline: 98416126 Cobalamin (vrtamin B12) 
biosynthesis: identification and characterization of a Bacillus megaterium cobi operon." Biochem J 1998;335:159-166. 

840. (Complex1_51K) 

[2087] Respiratory-chain NADH dehydrogenase 51 Kd subunit signatures Cross-reference(s) PS00644- 
COMPLEX1_51K_1 PS00645; COMPLEX1_51K_2 

[2088] Respiratory-chain NADH dehydrogenase (EC 1 .6.5.3) [1 ,2) (also known as complex I or NADH-ubiqulnone 
oxidoreductase) is an oligomeric enzymatic complex located in the inner mftochondrlal membrane which also seems 
to exist in the chloroplast and In cyanobacteria (as a NADH-plastoquinone oxidoreductase). Among the 25 to 30 
polypeptide subunits of this bioenergetic enzyme complex there Is one wrth a molecular weight of 51 Kd (in mammals) 
which IS the second largest subunit of complex I and is a component of the iron-sulfur (IP) fragment of the enzvme It 
seems to bind to NAD. FMN, and a 2Fe-2S cluster. 
[2089] The 51 Kd subunrt Is highly similar to [3.4]: 

- Subunft alpha of Alcaligenes eutrophus NAD-reducing hydrogenase (gene hoxF) which also binds to NAD FMN 
and a 2Fe-2S cluster. 

- Subunrt NQ01 of Paracoccus denitrificans NADH-ubiquinone oxidoreductase. 

- Subunft F of Escherichia coli NADH-ubiquinone oxidoreductase (gene nuoF). 

[2090] The 51 Kd subunft and the bacterial hydrogenase alpha subunit contains three regions of sequence similar- 
ities. The first one most probably corresponds to the NAD-binding sfte, the second to the FMN-binding srte and the 
third one. which contains three cysteines, to the iron-sulfur binding region. Signature patterns have been developed 
for the FMN-binding and for the 2Fe-2S binding regions. 

[2091] Consensus pattern: G-[AM]-G-[AR]-Y-[LIVM]-C.G-[DE](2)-[STA](2)-[LIM](2)-[EN]- S 
Consensus pattern: E-S-C-G-x-C-x-P-C-R-x-G [The three C's are putative 2Fe-2S ligands] 

[ 1] Ragan C.I., Curr. Top. Bloenerg. 15:1-36(1987). 

[ 2] Weiss H.. Friedrich T. Hofhaus G., Preis D., Eur J. Biochem. 197:563-576(1991), 

[ 3] Fearnley LM., Walker J.E. Biochim. Biophys. Acta 1140:105-134(1992). 

[ 4] Weidner U., Geier S.. Rock A., Friedrich T. Leif H., Weiss H.. J. Mol, Biol. 233:109-122(1993). 

[2092] 841 . (DAP^eplmerase) 

DIaminopimelate epimerase signature 

[2093] Cross-reference(s) PS01 326; DAP.EPIMERASE 

Diaminopimelate epimerase (EC 5.1.1.7) catalyzes the Isomeriazation of UL- to D.L-meso-diaminopimelate in the 
biosynthetic pathway leading from aspartate to lysine. This enzyme Is a protein of about 30 Kd. Two conserved cysteines 
seem [1] to function as the acid and base in the catalytic mechanism. As a signature pattern, the region surroundinq 
the first of these two active srte cysteines were selected. 
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[2094] Consensus pattern: N-x-D-G-S-x(4)-C-G-N-(GA]-x-R [C is an active site residue] Sequences known to belong 
to this class detected by the pattern ALL. except for an Anabaena dapF which has a Ser instead of the active site Cys 
12095] [ 1] Cinlli M., Zheng R., Scapin G., Blanchard J.S.. Biochemistry 37:16452-16458(1998) 
[2096] 842. (DNA_gyraseB_C) 
DNA topoisomerase II signature 

[2097] Cross-reference(s) PS00177; TOPOISOK/IERASE_ll 

DNA topoisomerase I (EC 5.99.1 .2) [1 ,2,3.4.E1] is one of the two types of enzyme that catalyze the interconversion 
of topological DNA isomers. Type II topoisomerases are ATP-dependent and act by passing a DNA segment through 
a transient double-strand break. Topoisomerase II is found in phages, archaebacteria. prokaryotes. eukaryotes and 
in African Swine Fever virus (ASF). In bacteriophage T4 topoisomerase II consists of three subunits (the product of 
genes 39. 52 and 60). In prokaryotes and in archaebacteria the enzyme, known as DNA gyrase, consists of two subunits 
(genes gyrA and gyrB [E2)). In some bacteria, a second type II topoisomerase has been identified" it is known as 
topoisomerase IV and is required for chromosome segregation, it also consists of two subunits (genes parC and parE) 
In eukaryotes, type II topoisomerase Is a homodimer 

[2098] There are many regions of sequence homology between the different subtypes of topoisomerase II The 
relation between the different subunits is shown in the following representation: 



< About- 1 400-residues- 



[ Protein 39-* ][ — Protein 52 — ] Phage T4 

[ .gyrB- ♦ ][ gyrA- ] Frokaryote II 

Archaebacteria 

[ parE * ][ parD ] Prokaryote IV 

[ * J Eukaryoteand 

ASF 

: Position of the pattern. 



[2099] As a signature pattern for this family of proteins, a region that contains a highly conserved pentapeptide was 
selected. The pattern is located in gyrB, in parE. and in protein 39 of phage T4 topoisomerase 
[21 00] Consensus pattern: [LI VMA]-x-E-G-[DN]-S-A-x-[STAG] 

[ 1] Sternglanz R., Curr. Opin. Cell Biol. 1:533-535(1990). 

[2] Bjornsti M.-A., Curr Opin. Struct. Biol. 1:99-103(1991). 

[ 3] Sharma A., Mondragon A., Curr Opin. Struct. Biol. 5:39-47(1995). 

[ 4] Roca J., Trends Biochem. Sci. 20:156-160(1995). 

[2101] 843. (DUF16) 
Protein of unknown function 

[2102] The function of this protein is unknown. It appears to only occur in Mycoplasma pneumoniae. 
Number of members: 26 

[2103] [1] HImmelreich R, Hilbert H. Plagens H. PirkI E. Li BC, Herrmann R; Medline: 97105885 Complete sequence 
analysis of the genome of the bacterium Mycoplasma pneumoniae." Nucleic Acids Res 1996-24-4420-4449 
[2104] 844. (DUF21) ' * 

[2105] Donnain of unknown function 

[2106] This transmembrane region has no known function. Many of the sequences in this family are annotated as 
hemolysins, however this is due to a similarity to Swiss:Q543l8 that does not contain this domain. This domain is 
found in the N-terminus of the proteins adjacent to two intracellular CBS domains CBS. 
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Number of members: 42 

[2107] 845. (DUF56) 
[2108] Integral membrane protein 
5 [21 09] The members of this family are putative integral membrane proteins. The function of the family is unknown, 
however the family includes Sec59 from yeast. Sec59 is a dolichol kinase EC:2.7. 1 . 1 08. but it is not clear if the enzymatic 
activity resides in this region or its N terminal region. 

Number of members: 13 

10 

[2110] 846. (DUF94) 

[2111] Domain of unknown function 

[2112] The function of this domain Is unknown. It is found in both eukaryotes and archaebacteria. The alignment 
contains a completely conserved aspartate residue that may be functionally important. The eukaryotic domains contains 
IS three conserved cysteines and a histidine that might be metal binding, however these are absent in the archaebacterial 
proteins. 

Number of members: 9 

20 [2113] 847. (FF) 
[2114] FF domain 

[2115] This donriain may be involved in protein-protein interaction [1]. 
Number of members: 42 

25 

[2116] [1] Bedford MT, Leder P; Medline: 99322199 The FF domain: a novel motif that often accompanies WW 
domains." Trends Biochem Sci 1999;24:264-265. 
[2117] 848. (FLO_LFY) 
Floricaula / Leafy protein 

30 [21 18] This family consists of various plant development proteins which are homologues of floricaula (FLO) and Leafy 
(LFY) proteins which are floral meristem identity proteins. Mutations in the sequences of these proteins affect flower 
and leaf development. 

Number of members: 16 

35 

[2119] 

[1] Hofer J, Turner L. Hellens R, Ambrose M. Matthews P. Michael A. Ellis N; Medline: 97411151 UNIFOLIATA 
regulates leaf and flower morphogenesis in pea." Curr Biol 1997;7:581-587. 
40 [2] Weigel D. Alvarez J, Smyth DR. Yanofsky MF, Meyerowitz EM; Medline: 92274452 LEAFY controls floral mer- 

istem identity in Arabidopsis." Cell 1992;69:843-859. 

[2120] 849. (G-patch) 
G-patch domain 

^5 [2121] This domain is found in a number of RNA binding proteins, and is also found in proteins that contain RNA 
binding domains. This suggests that this domain may have an RNA binding function. This domain has seven highly 
conserved glycines. 

Number of members: 47 

so 

[2122] [1] Aravind L. Koonin EV; Medline: 10470032 G-patch: a new conserved domain in eukaryotic RNA-processing 
proteins and type D retroviral polyproteins.' Trends Biochem Sci 1999;24:342-344. 
[2123] 850. (Gram-ve_porins) 
General diffusion Gram-negative porins signature 
55 [2124] Cross-reference(s) PS00576; GRAM_NEG_PORIN 

The outer membrane of Gram-negative bacteria acts as a molecular filter for hydrophilic compounds. Proteins, known 
as porins [1 ], are responsible for the 'molecular sieve' properties of the outer membrane. Porins form large water- filled 
channels which allows the diffusion of hydrophilic molecules into the periplasmtc space. Some porins form general 
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diffusion channels that albws any solutes up to a certain size (that size is known as the exclusion limit) to cross the 
membrane, while other porins are specific for a solute and contain a binding site for that solute inside the pores (these 
are known as selective porins). As porins are the major outer membrane proteins, they also serve as receptor sites 
for the binding of phages and bacteriocins. General diffusion porins generally assemble as trimer In the membrane 
and the transmembrane core of these proteins is composed exclusively of beta strands [2]. It has been shown [3] that 
a number of general porins are evolutionary related, these porins are: 

Enterobacteria phoE. 
Enterobacteria ompC. 
Enterobacteria ompF. 
Enterobacteria nmpC. 
Bacterbphage PA-2 LC. 
Neisseria PI. A. 
Neisseria Pl.B. 



[2125] As a signature pattem a conserved region was selected, located in the C-terminal part of these proteins, 
which spans two putative transmembrane beta strands. 

[21 26] Consensus pattem: [LI VMFY]-x(2)-G-x(2)-Y.x-F-x-K-x(2)-[SN]-[STAV]-[LI VMFYW]-V 

[1] Benz R.. Bauer K.. Eur. J. Biochem. 176:1-19(1988). 

[2] Jap B.K., Walian PJ.. Q. Rev Biophys. 23:367-403(1990). 

[3] Jeanteur D., Lakey J.H., Pattus F., Mol. Microbbl. 5:2153-2164(1991), 

[2127] 851.(HlyD) 

HlyD family secretion proteins signature 

[2128] Cross-reference(s) PS00543; HLYD_FAMILY 

Gram-negative bacteria produce a number of proteins which are secreted into the growth medium by a mechanism 
that does not require a cleaved N-terminal signal sequence. These proteins, while having different functions, require 
the help of two or more proteins for their secretion across the cell envelope. Amongst which a protein belonging to the 
ABC transporters family (see the relevant entry <PDOC00185>) and a protein belonging to a family which is currently 
composed [1 to 5] of the following members: 



Gene 


Species 


Protein which is exported 


hlyD 


Escherichia coll 


Hemolysin 


appD 


A.pleuropneumoniae 


Hemolysin 


IcnD 


Lactococcus lactis 


Lactococcin A 


IktD 


A. actinomycetemcom itans 


Pasteurella haemolytica Leukotoxin 


rtxD 


A.pleuropneumoniae 


Toxin-Ill 


cyaD 


Bordetella pertussis 


Calmodulin-sensitive adenylate cyclase-hemolysin(cyclolysin 


cvaA 


Escherichia coli 


Colicin V 


prtE 


Erwinia chrysanthemi 


Extracellular proteases B and C 


aprE 


Pseudomonas aeruginosa 


Alkaline protease 


emrA 


Escherichia coli 


Drugs and toxins 


yjcR 


Escherichia coli 


Unknown 



These proteins are evolutionary related and consist of from 390 to 480 amino acid residues. They seem to be anchored 
in the inner membrane by a N-terminal transmembrane region. Their exact role in the secretion process is not yet 
known. The C-terminal section of these proteins is the best conserved region; a signature pattem from that region was 
derived. 

[2129] Consensus pattem: ILIVM]-x(2)-G-[LM]-x{3)-[STGAV]-x.[LIVMT]-x-[LIVMT]-[GEl-x-[KR].x-[LIVMFYW](2)^^ 
[UVMFYW](3) J I i I J I jv ; 

Sequences known to bebng to this class detected by the pattern ALl., except for emrA and yjcR. 
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References: 
[2130] 

[1] Gilson L.. Mahanty H.K., Kolter R.. EMBO J. 9:3875-3884(1990). 

[2] Leloffe S., Delepelaire R. Wandersman C. EMBO J. 9:1375-1382(1990). 

[3] Stoddard G.W.. Petzel J.R, van Belkum M.J.. Kok J.. McKay LL, Appl. Environ, MicrobbI, 58:1 952-1 961 (1 992). 
[4] Duong F.. Lazdunski A , Cami B., Murgier M., Gene 121:47-54(1992). 
[5] Lewis K.. Trends Biochem. Sci. 19:119-123(1994). 

[2131] 852. (IBR) 
In Between Ring fingers 

[2132] The IBR (In Between Ring fingers) domain is found to occur between pairs of ring fingers (2f-G3HC4). The 
function of this domain is unknown. This domain has also been called the C6HC domain and DRIL (for double RING 
finger linked) domain [2]. 

fslumber of members: 25 

[2133] 

[1] Morett E, Bork P; Medline: 10366851 A novel transactivation domain in parkin.Trends Biochem Sci 1999 24* 
229-231 . 

[2] van der Reijden BA, Erpelinck-Verschueren CA. Lowenberg B, Jansen JH; Medline: 99349709 TRIADS: a new 
class of proteins with a novel cysteine-rich signature." Protein Sci 1999;8:1557-1561. 

[2134] 853. (I PPT) 
IPP transferase 

[1] Durand JM, Bjork GR. Kuwae A, Yoshikawa M, Sasakawa 0; Medline: 97440126 The modified nucleoside 
2-methylthio-N6-isopentenyladenosine in tRNA of Shigella flexneri is required for expression of virulence qenes 
" J Bacteriol 1 997; 1 79:5777-5782. 

[2] Boguta M, Hunter LA. Shen WC. Gillman EC, Martin NC, Hopper AK; Medline: 94187700 Subcellular locations 
of MODS proteins: mapping of sequences sufficient for targeting to mitochondria and demonstration that mito- 
chondrial and nuclear isoforms commingle in the cytosol." Mol Cell Biol 1 994; 1 4:2298-2306. 
[3] Gillman EC, SlusherLB. Martin NC, Hopper AK; Medline: 91203856 MODS translation initiation sites determine 
N6-isopentenyladenosine modification of mitochondrial and cytoplasmic tRNA." Mol Cell Biol 1 991 ; 1 1 :2382-2390. 

[2135] 854. (KE2) 
KE2 family protein 

[2136] The function of members of this family is unknown, although they have been suggested to contain a DNA 
binding leucine zipper motif [2]. 

Number of members: 9 

[2137] 

[1 ] Ha H, Abe K, Artzt K; Medline: 920841 31 Primary structure of the embryo-expressed gene KE2 from the mouse 
H-2K region." Gene 1991;107:345-346. 

[2] Shang HS. Wong SM. Tan HM. Wu M; Medline: 95129859 YKE2. a yeast nuclear gene encoding a protein 
showing homology to mouse KE2 and containing a putative leucine-zipper motif." Gene 1994;151:197-201. 

[2138] 855. (Lipoprotein_6) 

Prokaryotic membrane lipoprotein lipid attachment site 

[2139] Cross-reference(s) PS0001 3; PROKAR_LIPOPROTEIN 

In prokaryotes, membrane lipoproteins are synthesized with a precursor signal peptide, which is cleaved by a specific 
lipoprotein signal peptidase (signal peptidase 11). The peptidase recognizes a conserved sequence and cuts upstream 
of a cysteine residue to which a glyceride-fatty acid lipid is attached [1]. Some of the proteins known to undergo such 
processing currently include (for recent listings see [1,2,3]): 
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Major outer membrane lipoprotein (murein-lipoproteins) (gene Ipp). 
Escherichia coll iipoprotein-28 (gene nlpA). 
Escherichia coli lipoprotein-34 (gene nIpB). 
Escherichia coli lipoprotein nIpC. 
Escherichia coli lipoprotein nIpD. 

Escherichia coli osmotically inducible lipoprotein B (gene osmB). 
Escherichia coli osmotically inducible lipoprotein E (gene osmE). 
Escherichia coli peptidogtycan-associated lipoprotein (gene pal). 
Escherichia coli rare lipoproteins A and B (genes rplA and rplB). 
Escherichia coli copper homeostasis protein cutF (or nIpE). 
Escherichia coli plasmids traT proteins. 
Escherichia coli Col plasmids lysis proteins. 
A number of Bacillus beta-lactamases. 

Bacillus subtilis periplasmic oligopeptide-binding protein (gene oppA). 
Borrelia burgdorferi outer surface proteins A and B (genes ospA and ospB). 
Borrelia hermsii variable major protein 21 (gene vmp21) and 7 (gene vmp7). 
Chlamydia trachomatis outer membrane protein 3 (gene omp3). 
Fibrobacter succinogenes endoglucanase cel-3. 
Haemophilus influenzae proteins Pal and Pep. 
Klebsiella pullulunase (gene pulA). 
Klebsiella pullulunase secretion protein puis. 
Mycoplasma hyorhinis protein p37. 

Mycoplasma hyorhinis variant surface antigens A, B. and C (genes vIpABC). 
Neisseria outer membrane protein H.8. 
Pseudomonas aeruginosa lipopeptide (gene IppL). 
Pseudomonas solanacearum endoglucanase egl. 

Rhodopseudomonas viridis reaction center cytochrome subunit (gene cytC). 
Rickettsia 17 Kd antigen. 

Shigella flexneri invasion plasmid proteins mxiJ and mxiM. 
Streptococcus pneumoniae oligopeptide transport protein A (gene amiA). 
Treponema pallidium 34 Kd antigen. 
Treponenna pallidium membrane protein A (gene tmpA). 
Vibrio harveyi chitobiase (gene chb). 
Yersinia virulence plasmid protein yscJ. 

Halocyanin from Natrobacterium pharaonis [4], a membrane associated copper-binding protein. This is the first 
archaebacterial protein known to be modified in such a fashion). 

[2140] From the precursor sequences of all these proteins, a consensus pattern and a set of rules to identify this 
type of post-translational modification were derived. 

[2141] Consensus pattem: {DERK}(6)-[LIVMFWSTAG](2).[LIVMFYSTAGCQ].[AGS]-C [C is the lipid attachment 
site] Additional rules: 1) 

[2142] The cysteine must be between positions 15 and 35 of the sequence in consideration. 2) There must be at 
least one Lys or one Arg in the first seven positions of the sequence. Sequences known to belong to this class detected 
by the pattem ALL. Other sequence{s) detected in SWISS-PROT some 100 prokaryotic proteins. Some of them are 
not membrane lipoproteins, but at least half of them could be. 

References 

[2143] 

[1] Hayashi S„ Wu H.C., J. Bioenerg. Biomembr. 22:451-471(1990). 
[2] Klein R. Somorjai R.L. Lau RC.K., Protein Eng. 2:15-20(1988). 
[3] von Heijne G., Protein Eng. 2:531-534(1989). 

[4] Mattar S„ Scharf B., Kent S.B.H.. Rodewald K., Oesterhelt D., Engelhard M. J. Bbl. Chem. 269:14939-14945 
(1994). 

[2144] 856. (Lipoprotein_7) 
Adhesin lipoprotein 
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[2145] This family consists of the p50 and variable adherence-associated antigen (Vaa) adhesins from Mycoplasma 
hominis. M. hominis is a mycoplasma assoceted with human urogenital diseases, pneumonia, and septic arthritis [1]. 
An adhesin is a ceil surface molecule that mediates adhesion to other cells or to the surrounding surface or substrate. 
The Vaa antigen is a 50-kDa surface lipoprotein that has four tandem repetitive DNA sequences encoding a periodic 
B peptide structure, and is highly immunogenic in the human host [1]. p50 is also a 50-kDa lipoprotein, having three 
repeats A.B and C, that may be a tetramer of 191-kDa in its native environment [2]. 

Number of members: 18 

10 [2146] 

[1 ] Zhang Q, Wise KS; Medline: 96294788 Molecular basis of size and antigenic variation of a Mycoplasma hominis 
adhesin encoded by divergent vaa genes. Infect Immun 1996;64:2737-2744. 

[2] Henrich B, Kitzerow A, Feldmann RC, Schaal H. Hadding U; Medline: 97047675 Repetitive elements of the 
IS Mycoplasma hominis adhesin p50 can be differentiated by monoclonal antibodies." Infect Immun 1996;64: 

4027-4034. 

[2147] 857. (MaoC_like) 
MaoC like domain 

20 [2148] The MaoC protein is found to share similarity with a wide variety of enzymes; estradiol 1 7 beta-dehydrogenase 
4, peroxisomal hydratase-dehydrogenase-epimerase, fatty acid synthase beta subunit. All these enzymes contain other 
domains. This domain is also present in the NodN nodulation protein N. No specific function has been assigned to this 
region of any of these proteins. The maoC gene is part of a operon with maoA which is involved in the synthesis of 
monoamine oxidase [1 ]. 

2$ 

Number of members: 46 

[2149] [1] Sugino H, Sasaki M. Azakami H, Yamashita M, Murooka Y Medline: 96235221 A monoamine-regulated 
Klebsiella aerogenes operon containing the monoamine oxidase structural gene (maoA) and the maoC gene." J Bac- 
30 teriol 1992;174:2485-2492. 
[2150] 858. (MSP) 

Manganese-stabilizing protein / photosystem II polypeptide 

[2151] This family consists of the 33 KDa photosystem II polypeptide from the oxygen evolving complex (OEC) of 
plants and cyanobacteria. The protein is also known as the manganese-stabilizing protein as it is associated with the 
35 manganese complex of the OEC and may provide the ligands for the complex [1]. 

Number of members: 17 

[2152] [1] Philbrick JB, Zilinskas BA; Medline: 88334494 "Cloning, nucleotide sequence and mutational analysis of 
40 the gene encoding the Photosystem II manganese-stabilizing polypeptide of Synechocystis 6803." Mol Gen Genet 
1988;212:418-425. 
[2153] 859. (NAG) 

[2154] [1] Makarova KS. Aravind L, Galperin MY Grishin NV, Tatusov RL, Wolf Yl, Koonin EV; Medline: 99342100 
Comparative genomics of the Archaea (Euryarchaeota): evolution of conserved protein families, the stable core, and 
45 the variable shell." Genome Res 1999;9:608-628. 

Number of members: 27 

[2155] 850. (Nop) 

SO Putative snoRNA binding domain 

[2156] This family consists of various Pre RNA processing ribonucleoproteins. The function of the aligned region is 
unknown however it may be a common RNA or snoRNA or Nop1p binding domain. Nop5p (Nop58p) Swiss:Q12499 
from yeast is the protein component of a ribonucleoprotein protein required for pre-18s rRNA processing and is sug- 
gested to function with Nopip in a snoRNA complex (IJ. Nop56p Swlss:000567 and Nop5p interact with Nopip and 

55 are required for ribosome biogenesis [2]. Prp31 p Swiss:p49704 is required for pre-mRNA splicing in S. cerevisiae [3]. 
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Number of members: 23 
[2157] 

[1] Wu R Brockenbrough JS. Metcatfe AC, Chen S. Aris JP; Medline: 98298165 NopSp is a small nucleolar ribo- 

nucleoprotein component required for pre- 18 S rRNA processing in yeast." J Biol Chem 1998:273:16453-16463 

[2] Gautier T. Berges T, Tollervey D, Hurt E;Medline: 8038777 Nucleolar KKE/D repeat proteins Nop56p and Nop58p 

interact with Nopip and are required for ribosome biogenesis." Mol Cell Biol 1997;17:7088-7098, 

[3] Weidenhammer EM. Singh M, Ruiz-Noriega M, Woolford JL Jr; Medline: 96184869 The PRP31 gene encodes 

a novel protein required for pre-mRNA splicing in Saccharomyces cerevisiae." Nucleic Acids Res 1996-24* 

1164-1170. 

[2158] 861.(Nramp) 

Natural resistance-associated macrophage protein 

The natural resistance-associated macrophage protein (NRAMP) family consists of Nrampi, Nramp2. and yeast pro- 
teins Smf 1 and Smf2. The NRAMP family is a novel family of functional related proteins defined by a conserved hy- 
drophobic core of ten transmembrane domains [5]. This family of membrane proteins are divalent cation transporters 
Nrampi is an integral membrane protein expressed exclusively in cells of the immune system and is recruited to the 
membrane of a phagosome upon phagocytosis [1 ]. By controlling divalent catbn concentrations Nrampi may regulate 
the interphagosomal replication of bacteria [1]. Mutations in Nrampi may genetically predispose an individual to sus- 
ceptibility to diseases including leprosy and tuberculosis conversely this might however provide protection form rheu- 
matoid arthritis [1]. Nramp2 is a multiple divalent cation transporter for Fe2+. Mn2+ and 2n2+ amongst others it is 
expressed at high levels in the intestine; and is major transferrin-independent iron uptake system in mammals [1] The 
yeast proteins Smf 1 and Smf2 may also transport divalent cations [3]. 

Number of members: 36 

[2159] 

[1] Govoni G. Gros P; Medline: 98383996 Macrophage NRAMPI and its role in resistance to microbial infections 
" Inflamm Res 1 998;47:277-284. 

[2] Agranoff DD, Krishna S Medline: 98294035 Metal ion homeostasis and intracellular parasitism " Mol Microbiol 
1998;28:403-412. 

[3] Pinner E. Gruenheid S. Raymond M. Gros P; Medline: 98030559 Functional complementation of the yeast 
divalent cation transporter family SMF by NRAMP2. a member of the mammalian natural resistance- associated 
macrophage protein family." J Biol Chem 1997;272:28933-28938. 

[4] Cellier M, Belouchi A. Gros P; Medline: 96402487 Resistance to intracellular infections: comparative genomic 
analysis of Nramp." Trends Genet 1996;12:201-204. 

[5] Cellier M. Prive G, Belouchi A. Kwan T Rodrigues V, Chia W. Gros P; Medline: 96036029 Nramp defines a 
family of membrane proteins." Proc Natl Acad Scj U S A 1995;92:10089-10093. 

[2160] 862. (NTP_transf_2) 
Nucleotidyltransferase domain 

Members of this family belong to a large family of nucleotidyltransferases [1], 
Number of members: 83 

[21 61] [1 ] Holm L, Sander C; Medline: 96005605 DNA polymerase beta befongs to an ancient nucleotidyltransferase 
superfamily." Trends Biochem Sci 1995;20:345-347. 
[2162] 863. (Paramyxo_P) 
Paramyxovirus P phosphoprotein 

[2163] This family consists of paramyxovirus P phosphoprotein from sendai virus and human and bovine parainflu- 
enza viruses. The P protein is an essential part of the viral RNA polymerase complex formed form the P and L proteins 
[1]. The exact role of the P protein in this complex in unknown but it is involved in multiple protein-protein interactions 
and binding the polymerase complex to the nucleocapsid or ribonucleoprotein template [1]. It also appears to be im- 
portant for the proper folding of the L protein [1J. The paramyxoviruses have a negative sense ssRNA genome [1] 
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Number of members: 15 
[2164] 

[1] Bowman MC. Smallwood S. Moyer SA; Medline: 99329169 Dissection of Individual Functions of the Sendai 
Virus Phosphoprotein in Transcription." J Virol 1999;73:6474-6483. 

[2] Matsuoka Y, Curran J. Pelet T, Kolakofsky D. Ray R. Compans RW; Medline: 91237868 The P gene of human 
parainfluenza virus type 1 encodes P and C proteins but not a cystelne-rich V protein." J Virol 1 991 ;65:3406-341 0. 

[2165] 864. (Patatin) 

[2166] This family consists of various patatin glycoproteins from plants. The patatin protein accounts for up to 40% 
of the total soluble protein in potato tubers [2]. Patatin is a storage protein but it also has the enzymatic activity of lipid 
acyl hydrolase, catalysing the cleavage of fatty acids from membrane lipids [2]. 

Number of members: 21 

[2167] 

[1] Banfalvi Z, Kostyal 2, Barta E; Medline: 95107249 Solanum brevidens possesses a non-sucrose-inducible 
patattn gene." Mo! Gen Genet 1994;245:517-522. 

[2] MIgnery GA. Pikaard CS. Park WD; Medline: 88226014 Molecular characterization of the patatin multiqene 
family of potato." Gene 1988;62:27-44. 

[2168] 865. (Pentapeptide_2) 
Pentapeptide repeats (8 copies) 

[2169] These repeats are found In many mycobacterial proteins. These repeats are most common in the PPE family 
of proteins, where they are found in the MPTR subfamily of PPE proteins. The function of these repeats is unknown 
The repeat can be approximately described as XNXGX. where X can be any amino acid. These repeats are similar to 
Pentapeptide [1], however it is not clear if these two families are structurally related. 

Number of members: 362 

[2170] 

[1] Bateman A, Murzin A, Teichmann SA; Medline: 98318059 Structure and distribution of pentapeptide repeats 
in bacteria. " Protein Sci 1 998;7: 1 477-1 480. k k h H«cti& 

[2] Cole ST. Brosch R. Parkhill J. Gamier T Churcher C, Harris D. Gordon SV. Eiglmeier K. Gas S Barry CE 3rd 
Tekaia R Badcock K, Basham D, Brown D. Chillingworth T, Connor R. Davies R. Devlin K. Feltwell T Gentles S 
Hamlin N. Holroyd S. Homsby T Jagels K. Barrell BG; Medline: 98295987 Deciphering the biology of Mycobac- 
tenum tuberculosis from the complete genome sequence." Nature 1998;393:537-544. 

[2171] 866. (Peptidase_C13) 
Peptidase CI 3 family 

This family of peptidases is known as the hemoglobinase family because it contains a globin degrading enzyme from 
b ood parasites Swiss:P42665. However relatives are found in plants and other organisms that have other functions 
Members of this family are asparaginyl peptidases [1 ]. 

Number of members: 26 

[f!? n ' ^^"^^ Rawlings ND. Brown MA, Young NE. Stevens RA, Hewitt E, Watts C. Barrett AJ- 

J Srchem characterization of mammalian legumain. an asparaginyl endopeptidase." 

[2173] 867. (Pro^dh) 
Proline dehydrogenase 

Number of members: 25 

[2174] [1] Ung M. Allen SW. Wood JM; Medline: 95055736 Sequence analysis identifies the proline dehydrogenase 



313 



EP 1 033 405 A2 



and delta 1- pyrroline-S-carboxylate dehydrogenase donnains of the muftifunctbnal Escherichia coli PutA protein " J 
Mol Biol 1994:243:950-956. 
[2175] 868. (PsbP) 

[21 76] This family consists of the 23 kDa subunit of oxygen evolving system of photosystem II or PsbP from varbus 
plants (where it is encoded by the nuclear genome) and Cyanobacteria. The 23 KDa PsbP protein is required for PSIl 
to be fully operational in vivo, it increases the affinity of the water oxidation site for CI- and provides the conditbns 
required for high affinity binding of Ca2+ [2]. 

Number of members: 25 

[2177] 

[1] Rova EM, Mc Ewen B, Fredriksson PO. Styring S; Medline: 97067138 Photoactivation and photoinhibltbn are 
competing in a mutant of Chlamydomonas reinhardtii lacking the 23-kDa extrinsic subunit of photosystem II " J 
Biol Chem 1 996;27 1 :289 1 8-28924. 

[2] Kochhar A. Khurana JR Tyagi AK; Medline: 97191538 Nucleotide sequence of the psbP gene encoding pre- 
cursor of 23-kDa polypeptide of oxygen-evolving complex in Arabidopsis thaliana and its expression in the wild- 
type and a constitutively photomorphogenic mutant." DNA Res 1996;3:277-285. 

[2178] 869. (PUA) 

[2179] The PUA domain named after PseudoUridine synthase and Archaeosine transglycosylase, was detected in 
archaeal and eukaryotic pseudouridine synthases, archaeal archaeosine synthases, a family of predicted ATPases 
that may be involved in RNA modification, a family of predicted archaeal and bacterial rRNA methylases. Additbnally, 
the PUA domain was detected in a family of eukaryotic proteins that also contain a domain homologous to the translation 
initiatbn factor elF1/SUI1 ; these proteins may comprise a novel type of translation factors. Unexpectedly, the PUA 
domain was detected also In bacterial and yeast glutamate kinases; this is compatible with the demonstrated role of 
these enzymes in the regulation of the expression of other genes [1], It is predicted that the PUA domain is an RNA 
binding domain. 

Number of members: 48 

[2180] [1] AravindL, KooninEV; Medline: 991 931 78 Novel predicted RNA-binding domains associated with the trans- 
lation machinery." J Mol Evol 1999;48:291-302, 
[2181] 870. (RF1) 
eRFI-like proteins 

[21 82] Members of this family are peptide chain release factors. The eukaryotic Release Factor 1 proteins (eRF1 s) 
are involved in temnination of translation. The eRFI protein is functional for all stop codons and appears to abolish 
read-through of these codons. This family also includes other proteins for which the precise molecular function is 
unknown. Many of them are from Archaebacteria. These proteins nnay also be involved in translation termination but 
this awaits experimental verification. 

Number of members: 25 

[2183] 

[1] Frolova L. Le Goff X, Rasmussen HH, Cheperegin S, Drugeon G, Kress M, Arman I. Haenni AL, Cells JE, 
Philippe M, et al; Medline: 95082951 A highly conserved eukaryotic protein family possessing properties of polypep- 
tide chain release factor" [see comments] Nature 1994;372:701-703. 

[2] Drugeon G. Jean-Jean O, Frobva L. Le Goff X. Philippe M. Kisselev L, Haenni AL; Medline: 97315314 Eukary- 
otic release factor 1 (eRFI) abolishes readthrough and competes with suppressor tRNAs at all three termination 
codons in messenger RNA." Nucleic Acids Res 1997;25:2254-2258. 

[21 84] 871 . (RibosomaLLI 4e) 
Ribosomal protein LI 4 

[2185] This family includes the eukaryotic ribosomal protein LI 4. 
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Number of members: 15 



[2186] 872. (Ribosomal_S27) 
Ribosomal protein S27a 

[21 87] This family of ribosomal proteins consists mainly of the 40S ribosonnal protein S27a which is synthesized as 
a C-temiinal extensbn of ubiquitin (CEP). The S27a domain compromises the C-terminal half of the protein. The 
synthesis of ribosomal proteins as extensions of ubiquitin promotes their incorporation into nascent ribosomes by a 
transient metabolic stabilization and is required for efficient ribosome biogenesis [3]. The ribosomal extension protein 
S27a contains a basic region that Is proposed to fomi a zinc finger; its fusion gene is proposed as a mechanism to 
maintain a fixed ratio between ubiquitin necessary for degrading proteins and ribosomes a source of proteins [2]. 

Number of members: 36 



[2188] 873. (Spemiine_synth) 
Spemiine/spermidine synthase 

[2189] Spermine and spermidine are polyamines. This family includes spermidine synthase that catalyses the fifth 
(last) step in the biosynthesis of spermidine from arginine, and spermine synthase. 

Number of members: 39 



[2190] 



[1] Mezquita J. Pau M. Mezquita C; Medline: 97449308 Characterization and expression of two chicken cDNAs 
encoding ubiquitin fused to ribosomal proteins of 52 and 80 amino acids." Gene 1997;195:313-31 9. 
[2] Redman KL. Rechsteiner M; Medline; 89181932 Identification of the long ubiquitin extension as ribosomal 
protei- S27a." Nature 1989;338:438-440. 

[3] Finley D. Bartel B. V^rshavsky A; Medline: 89181925 The tails of ubiquitin precursors are ribosomal proteins 
whose fusion to ubiquitin facilitates ribosome biogenesis." Nature 1989;338:394-401. 

[2191] 874. (Surp)Surp nnodule 

[1] Denhez F. Lafyatis R; Medline: 94266805 Conservation of regulated altemative splicing and identification of func- 
tional domains in vertebrate homologs to the Drosophila splicing regulator, suppressor-of-white-apricot " J Biol Chem 
1994;269:16170-16179. 

[2192] This donoain is also known as the SWAP domain. SWAP stands for Suppressor-of-White-APricot. It has been 
suggested that these domains may be RNA binding [1]. 

Number of members: 32 



[2193] 875. (TFIIE)TFIIE alpha subunit 

The general transcriptbn factor TFIIE has an essential role in eukaryotic transcription initiation together with RNA 
polymerase II and other general factors. Human TFIIE consists of two subunits TFIIE^Ipha Swiss P29083 and TFIIE- 
beta Swiss:P29084 and joins the preinitiation complex after RNA polymerase II and TFIIF [1]. This family consists of 
the conserved amino terminal region of eukaryotic TFI lE^lpha [2] and proteins from archaebacteria that are presumed 
to be TFIIE-alpha subunits also Swiss:029501 [3]. 



Number of members: 12 



[2194] 



[1] Ohkuma Y, Sumimoto H, Hoffmann A, Shimasaki S, Horikoshi M. Roeder RG; Medline: 92065982 Structural 
motifs and potential sigma homologies in the large subunit of human general transcription factor TFIIE " Nature 
1991;354:398-401. 

[2] Ohkuma Y, Hashimoto S. Roeder RG, Horikoshi M; Medline: 93087200 Identification of two large subdanains 
in TFIIE-alpha on the basis of homology between Xenopus and human sequences. Nucleic Acids Res 1992-20- 
5838-5838. * ' 

[3] Klenk HR Clayton RA, Tomb JF. White O, Nelson KE. Ketchum KA, Dodson RJ. Gwinn M, Hickey EK Peterson 
JD, Richardson DL. Kerlavage AR. Graham DE. Kyrpides NC, Fleischmann RD. Quackenbush J, Lee NH Sutton 
GG, Gill S. Kirkness ER Dougherty BA. McKenney K, Adams MD. Loftus B, Venter JC, et al; Medline* 98049343 
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The complete genome sequence of the hyperthemiophilic. sulphate- reducing archaeon Archaeoglobus fulqidus 
•Nature 1997;390:364-370. ^ 

[2195] 876. (Transglut_core) 

[2196] Cross-reference(s) PS00547; TRANSGLUTAMINASES 

Transglutaminases (EC 2.32. 13) (TGase) (1 .2] are calcium-dependent enzymes that catalyze the cross-linking of pro- 
teins by promoting the formation of Isopept ide bonds between the gamma-carboxy I group of a glutamine in one polypep- 
tide Cham and the epsilon-amino group of a lysine in a second po^peptide chain. TGases also catalyze the conjugation 
of polyamines to proteins. The best known transglutaminase is blood coagulation factor XI 1 1, a plasma tetrameric protein 
composed of two catalytic A subunits and two non-catalytic B subunits. Factor XIII is responsible for cross-linking fibrin 
chains, thus stabilizing the fibrin clot, aher forms of transglutaminases are widely distributed in various organs tissues 
and body fluids. Sequence data is available for the following forms of TGase: 

- Transglutaminase K (Tgase K). a membrane-bound enzyme found in mammalian epidermis and important for the 
formatton of the comified cell envelope (gene TGM1 ). 

- Tissue transglutaminase (TGase C), a monomerio ubiquitous enzyme located in the cytoplasm (gene TGM2) 

- Transglutaminase 3, responsible for the later stages of cell envelope formation in the epidermis and the hair follicle 
(gene TGM3). 

Transglutaminase 4 (gene TGM4). 

[21 97] A conserved cysteine is known to be involved in the catalytic mechanism of TGases. The erythrocyte mem- 
brane band 4.2 protein, which probably plays an important role in regulating the shape of eiythrocytes and their me- 
chanical properties, is evolutionary related to TGases. However the active site cysteine Is substituted by an alanine 
and the 4.2 protein does not show TGase activity. 

[2198] Consensus pattem:[GT]-a[CA]-W-V-x-[SA]-[GA]-[IVT]-x(2)-T-x-[LMSC]-R-[CSA]-[LV]-G [The first C is the 
active site residue] Sequences known to belong to this class detected by the pattemALL. Other sequence(s) detected 
in SWISS-PROTNONE. 

[2199] [ 1] Ichinose A.. BottenusR.E.. Davie E.W. J. Btol. Chem. 265:13411-13414(1990). [2] GreenberaC S Birck- 
bichlerP.J.. Rice R.H. FASEB J. 5:3071-3077(1991). » . ., v,rv 

[2200] 877. (TruB_N)TmB family pseudouridylate synthase (N terminal domain) 

Members of this family are involved in modifying bases in RNA molecules. They carry out the conversbn of uracil 
bases to pseudouridine. This family includes TruB, a pseudouridylate synthase that specifically converts uracil 55 to 
pseudoundine in most tRNAs. This family also includes Cbf5p that modifies rRNA [2]. 

Number of members: 33 

[2201] 

[1 ] Nurse K. Wrzeslnski J. Bakin A, l^ne BG. Ofengand J; Medline: 96079944 Purificatfon. cloning, and properties 
of the tRNA psi 55 synthase from Escherchia coll." RNA 1995;1:102-112. 

[2] Lafontalne DU. Bousquet-Antonelli C. Henry Y. Caizergues-Ferrer M. Tollervey D; Medline: 981 39521 The box 
H + ACA snoRNAs carry CbfSp. the putative rRNA pseudouridine synthase." Genes Dev 1998;12:527-537. 

[2202] 878. (UDPGP)UTP-glucose-l -phosphate uridylyltransferase 

This family consists of UTP-glucose-1 -phosphate uridylyltransferases. EC:2.7.7.9. Also known as UDP-glucose py- 
rophosphorylase (UDPGP) and Glucose-1 -phosphate uridylyltransferase. UTP-glucose-1 -phosphate uridylyltrans- 
ferase catalyses the interconversion of MgUTP + glucose-1 -phosphate and UDP-glucose + MgPPi [1] UDP-glucose 
IS an important intermediate in mammalian carbohydrate interconversion involved in various metabolic roles depending 
on tissue type (1]. In Dictyostelium (slime mold) mutants in this enzyme abort the development cycle [21 Also within 
the family is UDP-N-acetylglucosamine Swiss:Q16222 or AGX1 [3] and two hypothetical proteins from Borrelia burg- 
dorferi the lyme disease spirochaete Swiss:051893 and Swiss:051036. 

Number of members: 18 

[2203] 

[1] Duggleby RG. Chao YC. Huang JG. Peng HL. Chang HY; Medline: 96202932 Sequence differences between 
human muscle and liver cDNAs for UDPglucose pyrophosphorylase and kinetic properties of the recombinant 
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enzymes expressed in Escherichia coliJ Eur J Blochem 1996;235:173-179. 

[2] Ragheb JA, Dottin RP; Medline: 87231 075 Structure and sequence of a UDP glucose pyrophosphorylase gene 
of Dictyosteliunn discoideum/ Nucleic Acids Res 1987;15:3891-3906. 

[3] Mio T, Yabe T. Arisawa M, Yamada-Okabe H; Medline: 98269105 The eukaryotic UDP-N-acetylglucosamine 
pyrophosphorylases. Gene cloning, protein expression, and catalytic mechanism. J Biol Chem 1998273- 
14392-14397. ' 



[2204] 879. (UPF004) Uncharacterized protein family UPF0044 signal ureCross-reference(s) PS01 301 ; UPF0044 
The following uncharacterized proteins have been shown [1] to be highlysimilar: - Bacillus subtilis hypothetical protein 
yqel. 



- Escherichia coli hypothetical protein yhbV and H1 1 333. the corresponding Haemophilus influenzae protein. 
Methanococcus jannaschii hypothetical protein MJ0652. 

These are small proteins of 10 to 15 Kd. They can be picked up in the database by the following pattem. This pattem 
is located in the N-terminal part of these proteins. 

[2205] Consensus pattem: L-[ST]-x(3)-K-x(3)-[KRJ.[SGA]-x-[GA].H-x-L-x-P.[LIV]-x(2)- [LIV]-[GA]-x(2)-G Sequenc- 
es known to belong to this class detected by the pattemALL. Other sequence(s) detected in SWISS-PROTNONE 
[2206] 880. (zf-A20)A20-like zinc fingerA20- (an inhibitor of cell death)-like zinc fingers. The zincfinger mediates self- 
association in A20. These fingers alsomediate IL-1 -induced NF-kappa B activation. 

Number of members: 22 



[2207] 



[1] Heyninck K, Beyaert R; Medline: 99126071 The cytokine-inducible zinc finger protein A20 inhibits IL-1 -induced 
NF- kappaS activation at the level of TRAF6. FEBS Lett 1999;442:147-150. 

[2] De Vaick D, Heyninck K. Van Criekinge W. Contreras R. Beyaert R, Fiers W; Medline: 96390831 A20. an inhibitor 
of cell death, self-associates by its zinc finger domain." FEBS Lett 1 996;384:61-64. 

[3] Song HY Rothe M. Goeddel DV; Medline: 96270609 The tumor necrosis factor-inducible zinc finger protein 
A20 interacts with TRAF1/TRAF2 and inhibits NF-kappaB activation. Proc Natl Acad Sci U S A 1 996-93*6721 -6725 
[4] Opipari AW Jr. Boguski MS. Dixit VM; Medline: 90368626 The A20 cDNA induced by tumor necrosis factor 
alpha encodes a novel type of zinc finger protein." J Biol Chem 1990;265:14705-14708. 

[2208] 881. (zf-PARP) 

Poly(ADP-ribose) polymerase zinc finger domain 

Cross-reference(s) PS00347; PARP_ZN_FINGER_1 PS50064; PARP_ZN_FINGER_2 

[2209] Poly(ADP-ribose) polymerase (EC 2.4.2.30) (PARP) [1 .2] is a eukaryotic enzyme that catalyzes the covalent 
attachment of ADP-nbose units from NAD(+) to various nuclear acceptor proteins. This post-translational modification 
of nuclear proteins is dependent on DNA. It appears to be involved in the regulation of various important cellular proc- 
esses such as differentiation, proliferation and tumor transformation as well as in the regulation of the molecular events 
involved in the recovery of the cell from DNA dannage. Structurally. PARR about 1000 amino-acids residues long, 
consists of three distinct domains: an N-terminal zinc-dependent DNA-binding domain, a central automodification cto^ 
mam and a C-terminal NAD-binding domain. The DNA-binding region contains a pair of zinc finger domains which 
have been shown to bind DNA in a zincKJependent manner. The zinc finger domains of PARP seem to bind specifically 
to single-stranded DNA, DNA ligase III [3] contains, in its N-temiinal section, a single copy of a zinc finger hiqhiv similar 
to those of PARP 

[221 0] Consensus pattem: C.[KR]-x-C-x(3)-|.x-K-x(3)-[RG]-x(1 6. 1 8)-W-[FYH].H-x(2)-C [The three C's and the H are 
zinc hgands] Sequences known to belong to this class detected by the pattemALL. Other sequence(s) detected in 
SWISS-PROTNONE. Sequences known to belong to this class detected by the profile ALL. Other sequence(s) detected 
in SWISS-PROTNONE. 

[2211] Note: This documentation entry is linked to both signature patterns and a profile. As the profile is much more 
sensitive than the patterns, you should use it if you have access to the necessary software tools to do so. 

[ 1) Althaus F.R., Richter C.R Mol. Biol. Biochem. Biophys. 37:1-126(1987). 

[ 2] de Murcia G., Menissier de Murcia J, Trends Biochem. Sci. 19:172-176(1994). 

( 3] Wei Y-F. Robins P. Carter K., Caldecott K.. Pappin D.J.C.. Yu G,-L. Wang R.-R. Shell B.K.. Nash R A , Schar 
P, Barnes D.E., Haselttne W.A., Lindahl T Mol. Cell. BioL 15:3206-3216(1995). 
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A. Asparaginase 2 



[2212] Asparaginase II (L-asparagine aminohydrolase II) Is an extracellular protein that may be associated wflh the 
cell wall and whose expression is affected by the availability of nitrogen. Asparaginase II catalyzes the reaction of L- 
Asparagine + Hp = L-Aspartate + NH3. As many leukemlas have high requirements for aspartic acid, asparaginase 
II proteins are useful as reagents for screening compounds for activity as leukemia chemotherapy products Aspara- 
ginase II protein can also be over- or under-expressed to alter amino acid content in plant tissues or to modify nitroqen 
fixation and/or nitrogen metabolism in plants. 
[221 3] Ref: Bon et al. (1 997) AppI Biochem Biotechnol 63-65: 203-1 2 



B. Chloroa b-bind 



[221 4] Chtarophyll a-b binding proteins are located In the thylakoid membranes of the chloroplast and bind chlorophyll 
a and chlorophyll b. thereby triggering a chemical reaction (photosynthesis). These proteins are useful in controlling 
the rate, efficiency and/or output of photosynthesis. Overexpression of chlorophyll a-b binding proteins is expected to 
increase the rate of photosynthesis. 

Ref: Leutwiler et al. (1986) Nucleic Acids Res 14; 4051-64 Brandt et al. (1992) Plant Mol Biol 19: 699-703 
C. DMRL synthase 

[221 SI DMRL Synthase (6,7-Dimethyl-8-Ribityllumazine Synthase) catalyzes the last step in riboflavin (Vitamin B,) 
synthesis, condensing 5-amino^-(1 ■-D)-ribityl-amino-2.4(1 H, 3H)-Pyrimidinedione with L-3.4-Dihydroxy-2-Butanone 
4-Phosphate producing 6,7-Dimethyl-a-(1-D-Ribityl)Luminazine. The enzyme forms a homopentamer. Engineering of 
these proteins or those with homologous sequences/structures may allow control of the amounts of vitamin B,available 
in plants and/or accumulation of pigment, as well as altering reactions requiring hydrogen ion carriersAransmitters 
Ref: Garcia-Ramirez et al. (1 995) J Biol Chem 270: 23801 -7 



D. El N 



[2216] These proteins are ATP-dependent DNA helicases that are required for initiation of viral DNA replication They 
orm a complex with the viral E2 protein. The El -E2 complex binds to the replication origin that contains binding sites 
for both proteins. The majority of sequences known for this group of proteins are from various papiltomaviruses a type 
of double stranded DNA virus. In plants, the prototype double stranded DNA virus is Cauliflower Mosaic virus (CaMV) 
Manipulation of these proteins, especially to produce variant proteins that form non-productive complexes enables 
production of plants that are resistant to infection by double stranded DNA viruses. 

Ref: Yang et al. (1 993) PNAS USA 90: 5086-90 

Ustav and Stenlund (1991) EMBO J 10: 449-57 
Callaway et al. (1996) Mol Plant Microbe Interact 9: 810-8 

E. EF1 G 

[2217] Elongation Factor-1 is composed of four subunits: alpha, beta, delta and gamma. Gamma subunits are pre- 
sumed to play a role in anchoring the complex to other cellular components. Studies of EF-1 genes in plants suggests 
that different forms of the EF-1 subunits may be expressed in particular organs or in response to stress Manipulation 
of the activity of these proteins, either by altered expresston level or by stmctural mutation, may result in the accumu- 
lation of a particular protein in a chosen organ or allow production of particular proteins during stress conditions. 

Ref: Kinzy et al. (1 994) NAR 22: 2703-7 Dunn et al. (1 993) Plant Mol Biol 23: 221 -5 Aguilar et al. (1 991 ) Plant Mol 
Biol 17: 351-60 

F. ENV Dolyprotein 

[2218] This family comprises the envelope or coat proteins known from a number of different retroviruses In mam- 
malian species, retroviruses are responsible for diseases such as leukemia and HIV. In plants, retroviruses are known 
in both monocot (e.g. Zeon-I) and dicot (e.g. Arabidopsis and tobacco) species and have been shown to induce mutant 
alleles at new loci. Engineering of plant ENV proteins may allow mobilization or targeting of endogenous or introduced 
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retroviruses, in essence generating a new method for mutant production, gene tagging and the like. 

Ref : Mamoun et a! (1 990) J Virol 64: 41 80-8 Grandbastien et al. (1 989) Nature 337: 376-80 Wriqht and Vovtas M 9981 
Genetics 149: 703-15 j \ i 

G. Glycosyl hvdr9 

[221 9] Proteins having this domain (previously known as the glycosyl hydrolase family 5 domain) catalyze the en- 
dohydrolysis of 1 .4-p-D-glucosidic linkages in cellulose. Numerous plant proteins with this domain exist and are ex- 
pressed in an organ specific manner. They are Involved in the fnjit ripening process, in cell elongation and plant re- 
production. Modulation of the activity of these proteins, either by over- or under-expression or by mutation of the 
polypeptide, could be used to affect post-harvest physiology (e.g. rate of ripening) or for engineering reproductive 
sterility. 

Ref: Gbrda et al. (1990) Biochemistry 29: 7264-9 Tucker et al. (1988) Plant Physiol 88:1257-62 Shani et al (1997) 
43: 837-42 Milligan and Gasser (1 995) Plant Mol Biol 28: 691 -711 

H. Glvcosyl_hvdr14 

[2220] The p-amylases (family 14 of glycosyl hydrolases) catalyze the hydrolysis of 1 ,4-a-glucosklk: linkages in 
polysaccharides and remove successive maltose units from the non-reducing ends of the chains. Mutants of B-amylase 
in Arabidopsis exhibited altered degradatran of starch throughout the diurnal cycle. In addltton. the mutant phenotypes 
indicated that these enzymes not only affect carbohydrate metabolism/catabolism, but also influence the amount of 
pigment stored within particular cells. Manipulation of the p-amylase genes enables control of plant pigmentation (for 
example, fibre pigment in cotton) as well as carbohydrate synthesis and degradation. 

Ref: Zeeman et al. (1998) Plant J 15: 357-65 Hiiano and Nakamura (1997) Plant Physiol 114: 5675-82 Kitamoto et 
al. (1988) J Bacteriol 170: 5848-54 

I. Glycosyl hvdr15 

[2221] Glycosyl hydrolases from family 1 5 (such as 1 .4-Alpha-D-Glucan glucohydrolase.) catalyze the hydrolysis of 
terminal 1.4-linked alpha-D-glucose residues successively from the non-reducing ends of the chains resulting In the 
release of p-D-Glucose. In plants these proteins have been tied to the mobilizatran of the xyloglucan stored in the 
cotyledonary cell walls. Proteins such as these could be varied to affect the rate of plant growth (for example during 
germination), storage and/or use of glucose and other sugars by plant tissues and alteration of the properties such 
as elasticity, of plant cell walls. 

Ref: Cromble et al. (1998) Plant J 15: 27-38 Hata et al. (1991) Agric Biol Chem 55: 941-9 
J. GlycosvL hydr20 

[2222] Members of the family 20 gVcosyl hydrolases catalyze the hydrolysis of terminal non-reducing N-acetly-D- 
hexosamine residues in N-acetyl-p-D-hexosamlnides. N-acetyl-p-glucosaminidase belongs to this family and exists in 
several different forms (consisting of various combinations of alpha and beta chains) depending on the organism 
Family 20 glycosyl hydrolases have been Implicated in lysosomal storage diseases (such as Sandhoff disease) and 
glycogen storage disease In humans. These types of proteins are also responsible for the hydrolysis of chltin In plants 
these proteins could be useful In controlling carbohydrate catabolism, thereby influencing the amount of sugars avail- 
able for storage and/or use in other metabolic pathways. In additbn, it is possible that such proteins could be used to 
engineer an endogenous insect protection mechanism, e.g. by secretion of a chitin-hydrolyzing composition by the 

Ref: Graham et al (1 988) J Biol Chem 263: 1 6823-9 O'Dowd et al. (1 988) Biochemistiy 27: 521 6-26 
K. HMG box 

[2223] The HMG box Is a novel type of DNA-binding domain found in a diverse group of proteins Numerous plant 
proteins contain this domain, such as the HMG 1/2-like proteins. The expression of some of these HMG proteins appears 
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to be regulated by circadian rhythms and in a light dependent manner, occurring at higher levels in roots for example 
and lower levels in light-grown tissues such as cotyledons. Generally. HMG proteins are thought to influence transcrip- 
tion regulation. In plants, HMGs are believed to have a role in maintaining pattems of circadian-regulated expression 
for other genes, suggesting that these proteins could be exploited to control growth and development. 

Ref: Laudet et al. (1 993) Nucleic Acids Res 21 : 2493-501 Zheng et at. (1 993) Plant Mol Biol 23' 81 3-23 Grasser et 
al. (1993) Plant Mol Biol 23: 619-25 

L. IL2 

[2224] lnterleukin-2 (IL-2)ls produced in mammals by T cells in response to antigenic or mitogenic stimulation and 
iscrucialforproperregulationandfunctioningof the immune response. IL-2iscapableofstimula1ingBcells monocytes 
lymphokine-activated killer cells, natural killer cells and glioma cells. Plant extracts have also been shown to stimulate 
the immune system (for example, mistletoe therapy for human cancer). It is known that IL-2 Is invoVed in feedback 
inhibition pathways that impact the inflammatoiy response as well as the growth inhibition of tumor reactive T cells 
Plant proteins containing IL-2-like sequences are useful as immunity-based therapeutics, acting in a manner similar 
to IL-2 in mammals. 

Ref: Heike et al. (1997) Scand J Immunol 45: 221-6 Ariel et al. (1998) J Immunol 161 : 2465-72 Schink (1 997) An- 
ticancer Drugs 8 Supp1 1 : S47-51 

M. Oxidored FMN 

[2225] NADPH dehydrogenases catalyze the reaction NADPH + acceptor = NADP(+) + reduced acceptor One mem- 
ber of this family is yeast old yellow enzyme' (OYE) and is thought to be involved in oxylipin metabolism A second 
yeast family member is a protein that binds estrogen binding protein (EBP) in addition to exhibiting oxidoreductase 
activity. An Arabidopsis homolog to OYE has been described and estrogen binding proteins in plants have been re- 
ported. Plant proteins from this class have the potential to be used to modify lipid metabolism/catabolism These pro- 
teins may also have use as therapeutics for breast and prostate cancer, and other abnormal growth in steroid-sensitive 
tissues. 

Ref: Baker etal. (1998) Proc Soc Exp Biol Med 217: 317-21 Schallerand Weiler (1997) J Biol Chem 272- 28066-72 
Mandani et al. (1994) PNAS USA 91: 922-6 

N. Oxidored q2 

[2226] The NADH-plastoquinone oxidoreductases catalyze the reaction NADH + plastoquinone = NAD(+) + plasto- 
quinol. In plants these reactions occur in the chloroplast and are believed to participate in a chloroplast respiratory 
system. Here, the NDH complex is postulated to act as a valve to remove excess reduction equivalents in the chloro- 
plasts. Manipulation of these proteins may improve the rate or efficiency of photosynthesis. 

Ref: Burrows et al. (1998) EMBO J 17: 868-76 Kofer et al (1998) Mol Gen Genet 258: 166-73 Maier et al (1995) J 
Mol Biol 251: 614-28 

O. PABP 

[2227] Polyadenylate binding proteins bind the poly (A) tail of mRNA. Plants, as exemplified by Arabidopsis, contain 
numerous PABP genes that are expressed in an organ-specific manner. For example, PABP2 is functional in roots and 
shoots, while PABP5 is expressed predominantly in immature flowers. The PABP proteins are implicated in numerous 
aspects of posttranscnptional regulation including mRNA turnover and translational initiation. Control of activity of PABP 
proteins provides the ability to control the expression of various genes in particular organs during devetopment. 

Ref: Hilson et al (1993) Plant Physiol 103: 525-33 Betostotsky and Meagher (1993) PNAS USA 90: 6686-90 
P. Parvo coat 

[2228] Parvovimses are linear single-stranded DNA vinjses that are encapsulated by three capsid proteins Plants 
are susceptible to infection by single stranded DNA viruses such as Maize streak virus (MSV) and various Gemini 
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viruses. The coat proteins in these plant viruses are critical to the virus life cycle within the plant For example the coat 
protein of MSV is thought to be involved in intra- and inter-cellular movement within the plant. Engineering of proteins 
having similarity to pan/oviral coat proteins, especially to produce proteins that interfere with maturation of the virus 
particle, enables the production of plants having better resistance to natural plant single-stranded DNA viruses. 

Ref: Uu et al. (1 997) J Gen Virol 78: 1 265-70 Rohde et at. (1 990) Virology 1 76: 648-51 

Q. Pkinase C 

[2229] Plant serine/threonine protein kinases possessing this domain are expressed In all tissues and are known to 
undergo serrne-specific autophosphoryiation and specifically phosphorylate two ribosomal proteins. Pl4and P16 Dur- 
ing development, these proteins predominate during high metabolic activity in growing buds, root tips leaf margins 
and germinating seeds. They are thought to be involved in the control of plant growth and developmerit In addition 
two genes encoding proteins from this family have been described that help plant cells adapt during cold or high salt 
stresses. Consequently, engineering Pkinase C proteins provides a way to control general growth/development of the 
plant as well as a means to provide endogenous protection against environmental stresses. 

Ref: Zhang et al. (1 994) J Biol Chem 269: 1 7586-92 IWizoguchi et al. (1 995) FEBS Lett 358: 1 99-204 
R. REV 

[2230] The REV proteins act post-transcriptionally to relieve negative repression of GAG and ENV production in 
retroviruses such as Human Immounodeficiency Virus type I (HIV-1). Plants contain retrovims-like viruses such as 
pararetroviruses and retrotransposons (i.e. transposons having tong temiinal repeats). Plant retrotransposons in par- 
ticular have been used to create mutations at various loci, thereby permitting gene isolation, gene tagging and the like 
Manipulation of plant REV proteins enables control of transposition frequencies of corresponding transposable ele- 
ments and provides a new tool for genetic engineering of plants. 

Ref: Sodroski et al. (1986) Nature 321; 412-7 Franchini et al. (1989) PNAS USA 86: 2433-7 Marquet et al (1995) 
77: 113-24 Grandbastien et al. (1989) Nature 337: 376-80 Wright and Voytas (1998) Genetics 149: 703-15 

S. RuBisCo small 

[2231] Ribulose 1 ,5-bisphosphate carboxylase/oxygenase (RuBisCo) catalyzes the initial step in the C3 photosvn- 
thetic carbon reduction cycle, adding carbon dioxide to D-ribulose 1 .5-bisphosphate to form two molecules of 3-phos- 
pho-p-glycerate. RuBisCo is comprised of two subunits, one large which is synthesized in the chloroplast. and one 
small which IS synthesized in the cytoplasm and then transported in to the chloroplast. The expression of the small 
submit of RuBisCo is light regulated. Manipulation of these proteins could increase the efficiency of photosynthesis 
or allow alterations in developmental timing. ynuiBSR. 

Ref: Giuliano et al. (1 988) PNAS USA 85: 7089-93 Dedonder et al. (1 993) Plant Physiol 1 01 : 801 -8 
T Sialvltransf 

[2232] Members of the CMP-N-acetylneuraminate-p-galactosamide-o-2.3-sialyltransferase family cataVze the fol- 
lowing reaction: ' 

CMP-N-acetylneuraminate + P-D-galactosyl-1.3-N-acetyl-a-D-galactosaminyl-R = CMP + a-N-acetylneraminyl-2 3-B- 
D-galactosyl-1 .3-N-acetyl-alpha-D-galactosaminyl-R. These proteins are though to be responsible for the synthesis of 
the sequence neurac-a-2.3-gal-p-1 .3-galnac- found on sugar chains )-linked to threonine or serine and also as a ter- 
minal sequence on certain gangliosides in mammalian cells. In plants, glycosyltransfeiases in the Golgi apparatus 
synthesize eel wall polysaccharides and elaborate the complex glycans of glycoproteins. Engineering of plant sialyl- 
transferases allows targeting of proteins to particular cellular locations or enables the making of changes in cell v^ll 
Structure. 

MnLf fi.^!ll^' ^^^^-^ '-^^ ^* Chem 269: 10028-33 Kitagawa and Paulson 

(1 994) J Biol Chem 269: 1 394-401 
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U. Signal 

[2233] Many plant proteins In this family contain sequBnces similar to those found in both components of the prokary- 
otic family of signal transducers known as the tvw>component systems. This suggests that activation may require a 
transfer of a phosphate group between the transmitter domain and the receiver domain. One family member in Arabi- 
dopsis appears to be involved in ethylene (a plant homione) signal transduction. Other proteins in this family appear 
to be involved in the regulation of gene transcription under conditions of environmental stress. Signal proteins can be 
exploited to affect plant growth and development and/or control plant responses to stress conditions such as cold 
nutrient availability, etc. 

Ref: Chang et al. (1993) Science 262: 539-44 Nagaya et al. (1993) Gene 131: 119-124 Gottfert et al. (1990) PNAS 

USA ST', 2680~4 

V. vMSA 

[2234] vMSA proteins are major surface antigens presenting on the envelope of various retroviruses Surface anti- 
gens of retrovimses are often involved in tropism of the virus. Plants contain retrovirus-like viruses such as pararetro- 
viruses and retrotransposons (i.e. transposons having long terminal repeats). Plant retrotransposons in particular have 
been used to create mutants at various loci, thereby pemiitting gene isolation, gene tagging and the like. Manipulation 
of plant VMSA proteins enables control of tropism of plant retroviruses that might be used for genetic engineering tools 
thus enabling targeting of the virus to particular species and/or tissues of plants. 

Ref: Okanroto et al. (1 988) J Gen Virol 69: 2575-83 Grandbastien et al. (1 989) Nature 337: 376-80 Wriqht and Vovtas 
(1998) Genetics 149:703-15 ' 

W. zf-CCCH 

[2235] This family of proteins is defined by having two CX(8)CX(5)CX(3)H-type zinc finger domains. These proteins 
cover a broad range of functions. For example, the COP1 protein acts as a repressor of photomorphogenesis in dark- 
ness; light stimuli abolish this suppressive action. In addition, COP1 protein can function as a negative transcriptional 
regulator capable of direct interaction with components of the G-protein signaling pathway. As a second example a 
Zf-CCCH protein identified in Arabidopsis appears to be involved in the resistance to DNA damage induced by UV light 
and chemical DNA-damaging agents. Overexpression of this class of proteins permits production of plants that are 
better suited to adverse environments. Manipulation of expression of zf-CCCH proteins functioning as transcriptional 
regulators, such as COP1, enables manipulation of some signal transduction pathways. 

Ref: Pang et al. (1993) Nucleic Acids Res 21: 1647-53 Deng et al. (1992) Cell 71: 791-801 
X. zf-RanBP 

So^'^f^'"^™ '""^ ''^'^9ory contain many X-X-F-G and X-F-X-F-G repeats, and may contain 

HANBP1 -like or PPIase domains. Plant proteins having domains similar to these include PAS1 and GMSTI PAS1 has 
been shown to have dramatic developmental affects that appear to be correlated with both cell division and cell wall 
elongation. GMSTI has high identity to the yeast STI stress-inducible gene and has been shown to be heat inducible 
Proteins such as these may be useful for controlling growth and form of development. 

Ref: Vittorioso et al. (1 998) Mol Cell Biol 1 8: 3034-43 Hernandez Torres et al. (1 995) 27: 1 221 -6 
Y. Peptidase M4a. 

[2237] Proteins belonging to this peptidase family are metalloproteases that bind zinc as a cofactor and are kjcated 
in the membranes of the endoplasmic reticulum. They function in NHa-temiinal proteolytic processing, as shown for 
^'^^^^ ""^'^ 3^"® '® '^^''^^ 'fie correct processing of a-factor, a yeast pheromone Fami V 

M4a peptidases also appear to be required for some prenylation reactions, mediating COOH-terminal CAAX process- 
ing. Prenylation reactions are believed to be involved in the regulation of protein^arotein and protein-membrane inter- 
actions. As an example, RAS GTPase activity is regulated in part by tocalization to the inner side of the plasma mem- 
brane upon prenylation. In plants, proteins from this famiV could be involved in pollen-stigma interactions such as 
those mediating self-pollenation vs. outcrossing, or could be members of several secondary metabolism pathways 



322 



EP1033 405 A2 



Ref: Fujimura-Kamada et aL (1 997) J Cell BloL 1 36: 271 -85. Tarn et al. (1 998) J Cell Biol. 1 42: 635-49. 
Z. DNA Pol Viral N 



[2238] The DNA pol Viral N domain is located at the N-terminal regbn of DNA polymerase isolated from several 
retroid viruses such as the Cauliflower Mosaic Virus. The domain motif has also been found in numerous other species 
from humans to cyanobactera. In these organisms, this motif seems to be associated with two types of sequences- 
ret rot ransposons and mitochondrial genes. In the mitochondrial sequences this domain is potentially involved in the 
self-splicing conducted by group II introns. Various manipulations of this gene in plants allows control of the numerous 
retrotransposons endogenous to plant genomes or allows engineering of mitochondrial function, especially to increase 
efficiency of energy utilization by cells. 

REF: Chapdelaine and Bonen (1 991 ) Cell 65: 465-72 Ferat and MIche (1993) Nature 364: 358-61 Wilson et al (1994) 
368: 32-8 Cambareri et al. (1994) 242: 658-65 Gaardner et al. (1981) NAR 9: 2871-2888 Cummings et al 
(1990) Curr Genet 17: 375-402 Hattori et al. (1986) Nature 321: 625-8 

Aa. Calpain Inhib 

[2239] This domain is found incalpastatin. an inhibitor protein specific for calpain. Calpain is a non-lysosomal calcium- 
dependent intracellular protease that appears to be involved in the dynamic changes of the cytoskeleton, especially 
actin-related structures, during early Drosophila embryogenesis [1 ], Calpastatins co-exist in cells with calpains and the 
subcellular distnbution of calpastatin is thought to be important to calpain regulation [2]. In plants calpains and calp- 
astatins could be involved in embryogenesis and non^mbryogenic organ reiteration. Mutations occurring in calpain 
inhibitor repeat domains would produce developmental abnormalities such as abnormal leaf, root or flower develop- 
ment. 

[2240] Refs 



1 Emori Y and Saigo K (1994) J Biol Chem 269: 25137-42. 

2 Mellgren RL. Lane RD, Mericle MT (1989) Biochim Biophys Acta 999: 71-77. 



Ab. chorismate bind 



[2241] Chorismate binding domains are present in plant anthranilate synthase (AS) genes. AS genes catalyze the 
first step in the biosynthesis of tryptophan by converting chorismate and L-glutamine to anthranilate, pyruvate and L- 
glutamate. Some of these genes are involved in feedback inhibition by tryptophan [1] while some are feedback insen- 
sitive [2]. In Arabidopsis. two AS genes have overlapping, but different distributions. One of these AS genes is induced 
by wounding and bacterial pathogen infiltration [1]. Mutations in the chorismate binding domain would affect the pro- 
duction of tryptophan and could influence the plant's defense system. AS gene products can be used for In vitro syn- 
thesis of tryptophan and tryptophan derivatives. 
[2242] Refs 



1 Niyogi KK, Fink GR (1992) Plant Cell 4: 721-33. 

2 Song HS, Brotherton JE. Gonzales RA, Wilholm JM (1998) Plant Physiol 117:533-43. 
Ac. late protein L2 



[2243] Papillomaviruses are encapsulated double stranded DNA viruses. Plants are susceptible to infection by dou- 
ble stranded DNA viruses such as Cauliflower Mosaic virus (CaM V). The coat proteins in these plant viruses are critical 
to the virus life cycle within the plant. For example, the coat protein of CaMV is thought to be involved in intra- and 
inter-cellular movement within the plant [1]. Engineering of proteins having similarity to papillomavirus coat proteins 
may enable the production of plants having better resistance to natural plant double stranded DNA viruses 
[2244] Refs 

1 Thompson SR, Melcher U (1993) J Gen Virol 74: 1141-8. 



Ad. Peptidase M41 



[2245] Proteins belonging to this peptidase family are metalloproteases that bind zinc as a cefaclor and are integral 
mennbrane proteins. They seem to be involved in the degradation of carboxy-terminal-tagged cytoplasmic proteins. In 
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plants, these proteins are located In the thylakoid membranes of the chloroplasts. their expression is light regulated 
and they are thought to be involved in degradation of soluble stromal proteins and tum^Dver of thylkoid proteins [1] 
Manipulation of expression and structure of these proteins would have effects on the efficiency of photosynthesis and 
the development of chloroplasts. 
[2246] Refs 

1 Lindahl M. Tabaks, Cseke L, Pichersky E. Andersson B, Adam Z (1996) J Biol Chem 271: 29329-34. 
Ae. UPF0051 

[2247] There is some evidence that, in plants, proteins in this family are involved in ATP synthesis in chloroplasts 
[1.2]. Mutations in these proteins or altering their expression would affect the efficiency of photosynthesis and enerov 
productbn. 
[2248] Refs 

1 Kostrzewa M, Zetsche K (1992) J Mol Biol 227: 961-70. 

2 Kostrzewa M, Zetsche K (1993) Plant Mol Biol 23: 67-76 

Af. E7 



[2249] Papillomaviruses are encapsulated double stranded DNA viruses. The Papiltomavirus early protein 7 (E7) is 
known as a potent immortalizing and transf omiing agent. Transformation by E7 is thought to be mediated by the physical 
associatbn of E7 with cellular proteins regulating entry into the cell cycle [1]. The result is entry into the cell cycle and 
suppression of temninal differentiation in mammalian cells. Thus, engineering of proteins having similarity to papilb- 
mavirus E7 protein enables the production of plants having altered cellular proliferation characteristics and possibly 
altered morphology. For example, overexpression of E7-like proteins would be expected to result in proliferation of 
cells of the tissue in which the E7 protein is expressed, perhaps with suppression of differentiation events Thus for 
example, overexpression of E7-like proteins in meristem cells can result in taller plants and suppressbn of leafina and/ 
or flowering. ^ 
[2250] Refs 

1 Zwerschke W, Jansen-Durr P Adv Cancer Res 2000;78:1-29 
Aq. Peptidase U7 

[2251] This protein is known to be an integral membrane protein in the cyanobacterium Synechocystis where it 
functions to digest cleaved signal peptides [1 ]. This activity is necessary to maintain proper secretion of mature proteins 
across the membrane. In higher plants this protein may be present in the plastid or chloroplast membranes where it 
would function by enabling protein movement into and out of the chloroplasts. Mutations in this protein would be ex- 
pected to affect the development of plastids. including chloroplasts, or alter the energy transfer system within the 
chloroplasts, thereby affecting growth and development. 
[2252] Refs 

1 Kaneko T, Sato S. Kotani H. Tanaka A, Asamizu E. Nakamura Y. Miyajima N. Hirosawa M, Sugiura M Sasamoto S 
Kimura T. Hosouchi T. Matsuno A, Muraki A. Nakazaki N. Naruo K. Okumura S. Shimpo S, Takeuchi C Wada T 
Watanabe A, Yamada M. Yasuda M, Tabata S (1 996) DNA Res 3: 1 09-36. 

A. Activities of Polvpeptides Comprising Signal Peptides 

[2253] Polypeptides comprising signal peptides are a family of proteins that are typically targeted to (1 ) a particular 
organelle or intracellular compartment. (2) interact wrth a particular molecule or (3) for secretion outside of a host cell 
Example of polypeptides comprising signal peptides include, without limitation, secreted proteins, soluble proteins 
receptors, proteins retained in the ER, etc. 

[2254] These proteins comprising signal peptides are useful to modulate ligand-receptor interactions, cell-to-cell 
communication, signal transduction, intracellular communication, and activities and/or chemical cascades that take 
part in an organism outside or within of any particular cell. 

[2255] One class of such proteins are soluble proteins which are transported out of the cell. These proteins can act 
as ligands that bind to receptor to trigger signal transduction or to permit communication between cells, 
[2256] Another class is receptor proteins which also comprise a retention domain that lodges the receptor protein in 
the membrane when the cell transports the receptor to the surface of the cell. Like the soluble ligands, receptors can 
also modulate signal transduction and communication between cells. 
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[2257] In addition the signal peptide itself can serve as a li^nd for some receptors. An example is the interaction of 
the ER targeting signal peptide with the signal recognition particle (SRP). Here, the SRP binds to the signal peptide, 
halting translaticxi, and the resulting SRP complex then binds to docking proteins located on the surface of the Er] 
prompting transfer of the protein into the ER. 

[2258] A description of signal peptide residue composition is described below in Subsection IVC.1 . 
III. Methods of Modulating Polypeptide Production 

[2259] It is contemplated that polynucleotides of the invention can be incorporated into a host cell or in-vitro system 
to modulate polypeptide production. For instance, the SDFs prepared as described herein can be used to prepare 
expression cassettes useful in a number of techniques for suppressing or enhancing expression. 
[2260] An example are polynucleotides comprising sequences to be transcribed, such as coding sequences, of the 
present invention can be inserted into nucleic acid constructs to modulate polypeptide production. Typically, such se- 
quences to be transcribed are heterologous to at least one element of the nucleic acid construct to generate a chimeric 
gene or construct. 

[2261] Another example of useful polynucleotides are nucleic acid molecules comprising regulatory sequences of 
the present invention. Chimeric genes or constructs can be generated when the regulatory sequences of the invention 
linked to heterobgous sequences in a vector construct. Within the scope of invention are such chimeric gene andybr 
constructs. 

[2262] Also within the scope of the invention are nucleic acid molecules, whereof at least a part or fragment of these 
DNA molecules are presented in REF AND SEQ TABLES 1 AND 2 of the present application, and wherein the coding 
sequence is under the control of its own promoter and/or its own regulatory elements. Such molecules are useful for 
transforming the genome of a host cell or an organism regenerated from said host cell for modulating polypeptide 
production. 

[2263] Additionally, a vector capable of producing the oligonucleotide can be inserted into the host cell to deliver the 
oligonucleotide. 

[2264] More detailed description of components to be included in vector constructs are described both above and 
below. 

[2265] Whether the chimeric vectors or native nucleic acids are utilized, such polynucleotides can be incorporated 
into a host cell to modulate polypeptide production. Native genes and/or nucleic acid molecules can be effective when 
exogenous to the host cell. 

[2266] Methods of modulating polypeptide expression includes, without limitation: 
Suppression methods, such as 

Antisense 

Ribozymes 

Co-suppression 

Insertion of Sequences into the Gene to be Modulated 
Regulatory Sequence Modulation. 

as well as Methods for Enhancing Production, such as 

Insertion of Exogenous Sequences; and 
Regulatory Sequence Modulation. 

I II. A. Suppression 

[2267] Expression cassettes of the invention can be used to suppress expression of endogenous genes which com- 
prise the SDF sequence. Inhibiting expression can be useful, for instance, to tailor the ripening characteristics of a fmit 
(Oeller et al.. Sc/ence 254:437 (1991)) or to influence seed size (WO9a/07842) or to provoke cell ablatbn (Mariani et 
al., Nature 357: 384-387 (1992). 

[2268] As described above, a number of methods can be used to inhibit gene expression in plants, such as antisense, 
ribozyme. introduction of exogenous genes into a host cell, insertion of a polynucleotide sequence into the coding 
sequence and/or the promoter of the endogenous gene of interest, and the like. 

Ill A. 1. Antisense 

[2269] An expression cassette as described above can be transf omried into host cell or plant to produce an antisense 
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strand of RNA. For plant cells, antisense RNA inhibits gene expression by preventing the accumulation of mRN A which 
encodes the enzyme of interest, see, e.g., Sheehy et al.. Proc. Nat Acad, ScL USA, 85:8805 (1988) and Hiatt et al 
U.S. Patent No. 4,801.340. 

III.A.2. Ribozvmes 

[2270] Similarly, ribozyme constructs can be transformed into a plant to cleave mRN A and down-regulate translation. 
IIIA3. Co-Suppression 

[2271] Another method of suppression is by introducing an exogenous copy of the gene to be suppressed. Introduc- 
tion of expression cassettes In which a nucleic acid is configured in the sense orientation with respect to the promoter 
has been shown to prevent the accumulation of mRNA. A detailed description of this method is described above, 

III A4. insertion of Sequences into the Gene to be Modulated 

[2272] Yet another means of suppressing gene expression is to Insert a polynucleotide Into the gene of interest to 
disrupt transcription or translation of the gene. 

[2273] Homologous recombination could be used to target a polynucleotide Insert to a gene using the Cre-Lox system 
(A.C. Vergunst et al., Nucleic Acids Res. 26:2729 (1998). A.C. Vergunst et al.. Plant l\4oL Biol. 38 393 (1998) H Albert 
ex al. Plant J. 7:649 {1995)1 ~ 
[2274] In addition, random insertion of polynucleotides into a host cell genome can also be used to disrupt the gene 
of interest. Azpiroz-Leehan et aL. Trends in Genetics 13: 152 (1 997). In this method, screening for clones from a library 
containing random insertbns is preferred for identifying those that have polynucleotides inserted into the gene of In- 
terest. Such screening can be performed using probes and/or primers described above based on sequences from REF 
AND SEQ TABLES 1 AND 2, fragments thereof, and substantially similar sequence thereto. The screening can also 
be perfomied by selecting clones or any transgenic plants having a desired phenotype. 

III.A.5. Regulatory SequencelVlodulatton 

[2275] The SDFs described in REF and SEQ TABLES 1 and 2. and fragments thereof are examples of nucleotides 
of the invention that contain regulatory sequences that can be used to suppress or inactivate transcription and/or 
translation from a gene of Interest as discussed in I.C.5. 

III. A.6. Genes Comprising Dominant-Negative Mutations 

[2276] When suppression of production of the endogenous, native protein Is desired It is often helpful to express a 
gene compnsing a dominant negative mutation. Production of protein variants produced from genes comprising dom- 
inant negative mutations is a useful tool for research Genes comprising dominant negative mutations can produce a 
variant polypeptide which is capable of competing with the native polypeptide, but which does not produce the native 
result. Consequently, over expression of genes comprising these mutations can titrate out an undesired activity of the 
native protein. For example. The product from a gene comprising a dominant negative mutation of a receptor can be 
used to constitutlvely activate or suppress a signal transduction cascade, allowing examination of the phenotype and 
thus the trait(s) controlled by that receptor and pathway Altematively, the protein arising from the gene comprising a 
dominant-negative mutation can be an Inactive enzyme still capable of binding to the same substrate as the native 
protein and therefore competes with such native protein. 

[2277] Products from genes comprising dominant-negative mutations can also act upon the native protein itself to 
prevent activity. For example, the native protein may be active only as a homo^nuttimer or as one subunit of a hetero- 
muftimer. Incorporation of an Inactive subunit into the multimer with native subunlt(s) can inhibit activity. 
[2278] Thus, gene functic^ can be modulated In host cells of interest by insertion into these cells vector constructs 
comprising a gene comprising a dominant-negative mutation. 

III.B. Enhanced Expression 

[2279] Enhanced expression of a gene of interest in a host cell can be accomplished by either (1 ) insertbn of an 
exogenous gene; or (2) promoter nrK)dulation. 
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III.B.1. Insertion of an Excxaenous Gene 

[2280] Insertion of an expression construct encoding an exogenous gene can boost the number of gene copies 
expressed in a host cell. *^ 

[2281] Such expression constructs can comprise genes that either encode the native protein that is of Interest or 
that encode a variant that exhibits enhanced activity as compared to the native protein. Such genes encoding proteins 
of interest can be constructed from the sequences from REF AND SEQ TABLES 1 AND 2. fragments thereof and 
substantially similar sequence thereto. 

[22821 Such an exogenous gene can include either a constitutive promoter permitting expression in any cell in a host 
organism or a promoter that directs transcription only in particular cells or times during a host cell life cycle or in resoonso 
to environmental stimuli. k 

III. B.2. Requlatoro Sequence Modulation 

[2283] The SDFs of REF and SEQ TABLES 1 AND 2. and fragments thereof, contain regulatory sequences that can 
be used to enhance expression of a gene of interest. For example, some of these sequences contain useful enhancer 
elements. In some cases, duplication of enhancer elements or Insertion of exogenous enhancer elements will increase 
expression of a desired gene from a particular promoter. As other examples, all 11 promoters require binding of a 
regulatoiy protein to be activated, while some promoters may need a protein that signals a promoter binding protein 
to expose a polymerase binding site. In either case, over-production of such proteins can be used to enhance expres- 
sion of a gene of interest by Increasing the activation time of the promoter. 

[2284] Such regulatory proteins are encoded by some of the sequences In REF AND SEQ TABLES 1 AND 2 frao- 
ments thereof, and substantially similar sequences thereto. ' 
[2285] Coding sequences for these proteins can be constructed as described above. 

IV. Gene Constructs and Vector Construction 

^^^^ o ® °' P^^^®"' invention or a combination of them or parts and/or mutants and/or fusfons 

of said SDFs in the above techniques, recombinant DNA vectors which comprise said SDFs and are suitable for trans- 
formation of cells, such as plant cells, are usually prepared. The SDF construct can be made using standard recom- 
binant DNA techniques (Sambrook et al. 1989) and can be introduced to the species of interest by Agrobacterium- 
medBted transformation or by other means of transformatton (e.g.. particle gun bombardment) as referenced below 
L A?^ V ® vector backbone can be any of those typical in the art such as plasmids, viruses, artificial chromosomes 
BAUs, YACs and PACs and vectors of the sort described by 

USA 9? 9?75"9979 0996r '^^^ "^^^ 

(b) YAC: Burke et al.. Science 236:806-812 (1987);. 

(c) PAG: Sternberg N. et al., Proc Natl Acad Scl U is A Jan;87(1):l03-7 (1990); 

(d) Bacteria-Yeast Shuttle Vectors: Bradshaw et al., NucI Acids Res 23: 4850-4856 (1995)- 

(e) Lambda Phage Vectors: Replacement Vector, e.g., Frischauf et al., J. Mol Biol 1 70: 827-842 (1983)- or Insertion 
vector, e.g., Huynh et al., In: Glover NM (ed) DNA Cloning: A practical Approach, Vbl.1 Oxford: IRL Press (1985)- 

(f) T-DNA gene fusion vectors: WaWen et al., Mol Cell Bbl 1 : 1 75-1 94 (1 990); and 

(g) Piasmid vectors: Sambrook et al., infra. 

[2288] Typically, a vector will comprise the exogenous gene, which in its turn comprises an SDF of the present 
invention to be introduced into the genome of a host cell, and which gene may be an antisense construct, a ribozyme 
construct chimeraplast, or a coding sequence with any desired transcriptional and/or translational regulatory sequenc- 
es, such as promoters, UTRs, and 3" end terminatbn sequences. Vectors of the invention can also include origins of 
replication, scaffold attachment regicwis (SARs). markers, homotogous sequences, introns etc 
[2289] A DNA sequence coding for the desired polypeptide, for example a cDNA sequence encoding a full length 
protein, will preferably be combined with transcriptional and translational inltiatton regulatory sequences which will 
direct the transcnption of the sequence from the gene in the intended tissues of the transformed plant 
[2290] For example, for over-expression, a plant promoter fragment may be employed that will direct transcription 
of the gene in all tissues of a regenerated plant. Altemativeiy, the plant promoter may direct transcription of an SDF of 
the invention in a specific tissue (tissuespecific promoters) or may be otherwise under more precise environmental 
control (inducible promoters). 

[2291] If proper polypeptide productlonis desired, a polyadenylatlon region at the S'-end of the coding regbn Is typ- 
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ically included. The polyadenylatbn regton can be derived from the natural gene, from a variety of other plant genes 
or from T-DNA. " ' 

[2292] The vector comprising the sequences from genes or SDF or the invention may comprise a marker gene that 
confers a selectable phenotype on plant cells. The vector can include promoter and coding sequence, for instance 
For exaiTiple. the marker may encode biocide resistance, particularly antibiotic resistance, such as resistance to kan- 
amycin. G418. bleomycin, hygromycin, or herbicide resistance, such as resistance to chtorosulfuron or phosphinotricin. 

IV.A. Codino Sequences 

[2293] Generally, the sequence in the transformation vector and to be introduced into the genome of the host cell 
does not need to be absolutely identical to an SDF of the present invention. Also, it is not necessary for it to be full 
length, relative to either the primary transcription product or fully processed mRNA. Furthermore, the introduced se- 
quence need not have the same Intron or exon pattern as a native gene. Also, heterologous non-coding segments can 
l^<^ucZ° "^'"^ sequence without changing the desired amino acid sequence of the polypeptide to be 

IV.B. Promoters 

[2294] As explained above, introducing an exogenous SDF from the same species or an orthologous SDF from 
another species can modulate the expression of a native gene corresponding to that SDF of interest Such an SDF 
construct can be under the control of either a constitutive promoter or a highly regulated inducible promoter (ea a 
copper inducible promoter). The promoter of interest can initially be either endogenous or heterologous to the species 
in question. When re-introduced into the genome of said species, such promoter becomes exogenous to said species 
Over-expression of an SDF transgene can lead to co^uppression of the homologous endogeneous sequence thereby 
creating some alterations in the phenotypes of the transfomied species as demonstrated by similar analysis of the 
chalcone synthase gene (Napoli et al.. Plant Cell 2:279 (1990) and van der Krol et al.. Plant Cell 2 291 (1990)) If an 
SDF IS found to encode a protein with desirable characteristics, its over-production can be control'ed so that its accu- 
mutetion can be manipulated in an organ- or tissue-specific manner utilizing a promoter having such specificity. 
[2295] Likewise, if the promoter of an SDF (or an SDF that includes a promoter) is found to be tissue-specific or 
developmentally regulated, such a promoter can be utilized to drive or facilitate the transcription of a specific gene of 
interest (e.g.. seed storage protein or root-specific protein). Thus, the level of accumulation of a particular protein can 
be manipulated or its spatial localization in an organ- or tissue- specific manner can be altered. 

IV. C Signal Peptides 

[2296] SDFs of the present invention containing signal peptides are indicated in the REF and SEQ TABLES In some 
cases It may be desirable for the protein encoded by an introduced exogenous or orthologous SDF to be targeted (1 ) 
to a particular organelle intracellular compartment, (2) to interact with a particular molecule such as a membrane mol- 
peptide' °' introduced SDF This will be accomplished using a signal 

[2297] Signal peptides direct protein targeting, are involved in ligand-receptor interactions and act in cell to cell 
communication. Many proteins, especially soluble proteins, contain a signal peptide that targets the protein to one of 
several different intracellular compartments. In plants, these compartments include, but are not limited to the endo- 
plasmic reticulum (ER). mitochondria, plastids (such as chloroplasts), the vacuole, the Golgi apparatus, protein storage 
vessicles (PSV) and. in general, membranes. Some signal peptide sequences are consenred. such as the Asn-Pro- 
lle-Arg amino acid motif found in the N-temiinal propeptide signal that targets proteins to the vacuole (Marty (1999) 
The Plant CellM : 587-599). aher signal peptides do not have a consensus sequence perse, but are largely composed 
of hydrophobic ammo acids, such as those signal peptides targeting proteins to the ER (Vitale and Denecke (1999) 
The Pfe,nfCe//11:615-628).Stillothers do not appear to contain eitheraconsensussequenceor an identified commor! 
t^T?«,!^crr;'°' '"""^^ chloroplast stromal targeting signal peptides (Keegstra and Cline (1999) 77,e 
pant cell 1 1 : 557-570). Furthermore, some targeting peptides are bipartite, directing proteins first to an organelle and 
Mnon?!'"®""'^^"® ofgarielle (e.g. within the thylakoid lumen of the chtoroplast; see Keegstra and Cline 

(1 999) The Plant Ceinv. 557-570). In addition to the diversity in sequence and secondary structure, placement of the 
signal peptide is also varied. Proteins destined for the vacuole, for example, have targeting signal peptides found at 
he N-temiinus. at the C-temiinus and at a surface location in mature, folded proteins. Signal peptides also sen/e as 
iigands for some receptors. 

^^^l ''^^"^^"^^''^ °* signal proteins can be used to more tightly control the phenotypic expression of 
introduced SDFs. In particular, associating the appropriate signal sequence with a specific SDF can allow sequestering 
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of the protein in specific organelles (plastids, as an example), secretion outskde of the cell, targeting interaction with 
particular receptors, etc. Hence, the inclusion of signal proteins in constructs involving the SDFs of the inventic^ in- 
creases the range of nrenipulation of SDF phenotypic expression. The nucleotide sequence of the signal peptide can 
be isolated from characterized genes using common molecular biotogical techniques or can be synthesized in vitro. 
[2299] In addition, the native signal peptide sequences, both amino acid and nucleotide, described in the REF and 
SEQ tables can be used to modulate polypeptide transport. Further variants of the native signal peptides described in 
the REF and SEQ tables are contemplated. Insertions, deletions, or substitutions can be made. Such variants will 
retain at least one of the functions of the native signal peptide as well as exhibiting some degree of sequence identity 
to the native sequence. 

[2300] Also, fragments of the signal peptides of the invention are useful and can be fused with other sinal peptides 
of interest to modulate transport of a polypeptide. 

V. Transformation Techniques 

[2301] A wide range of techniques for inserting exogenous polynucleotides are known for a number of host cells, 
including, without limitation, bacterial, yeast, mammalian, insect and plant cells. 

[2302] Techniques for transforming a wide variety of higher plant species are well known and described in the tech- 
nical and scientific literature. See, e.g. Weising et al., Ann. Rev. Genet 22A2^ (1988); and Christou, Euphytica, v. 85 
n. 1-3; 13-27, (1995). 

[2303] DNA constructs of the invention may be introduced into the genome of the desired plant host by a variety of 
conventional techniques. For example, the DNA construct may be introduced directly into the genomic DNA of the 
plant cell using techniques such as electroporation and microinjection of plant cell protoplasts, or the DNA constructs 
can be introduced directly to plant tissue using ballistic methods, such as DNA particle bombardment. Alternatively 
the DNA constructs may be combined with suitable TDNA flanking regions and introduced into a conventional Agro- 
bacten'um tumefaciens host vector The virulence functions of the Agrobacterium tumefacier^ host will direct the in- 
sertion of the construct and adjacent marker into the plant cell DNA when the cell is Infected by the bacteria (McCormac 
et al., MoL Biotechnol. 8: 1 99 (1 997); Hamilton, Gen© 200: 1 07 (1 997)); Salomon et al. EMBO J. 3: 1 41 (1 984); Herrera- 
Estrella et al. EMBO J. 2:987 (1 983). 

[2304] Microinjection techniques are known in the art and well described in the scientific and patent literature. The 
introduction of DNA constructs using polyethylene glycol precipitation is described in Paszkowski et al. EMBO J. 3: 
2717 (1984). Electroporation techniques are described in Fromm et al. Proc. Natl Acad Sci. aSy^ 82:5824 (1985). 
Ballistic transformation techniques are described in Klein et al. Nature 32r773 (1987). Agrobacterium tumefaciens 
mediated transformation techniques, including disarming and use of binary or cointegrate vectors, are well described 
in the scientific literature. See, for example Hamilton, CM., Gene 200:107 (1997); Muller et al. Mol. Gen. Genet. 207: 
171 (1987); Komari et al. Plant J. 20:165 (1998); Venkateswariu et al. Biotechnology 9:11 03 (1991)an6 Gleave. AP. 
PlantMol. B/o/. 20:1203(1992); Graves and Goldman, PlantMol e/oA 7:34 (1986) and Gould etal , Plant Physiology 
95:426(1991). 

[2305] Transformed plant cells which are derived by any of the above transfomnation techniques can be cultured to 
regenerate a whole plant that possesses the transformed genotype and thus the desired phenotype such as seedless- 
ness. Such regeneration techniques rely on manipulation of certain phytohormones in a tissue culture growth medium, 
typically relying on a biocide and/or herbicide nriarker which has been introduced together with the desired nucleotide 
sequences. Plant regeneration from cultured protoplasts is described in Evans et al., Protoplasts Isolation and Culture 
in Handbook of Plant Cell Culture." pp. 124176, IV!acMillan Publishing Company, New York, 1983; and Binding, Re- 
generation of Plants, Plant Protoplasts, pp. 2173, CRC Press. Boca Raton, 1988. Regeneration can also be obtained 
from plant callus, explants, organs, or parts thereof. Such regeneration techniques are described generally in Klee et 
al. Ann. Rev, of Plant Phys. ^:467 (1987). Regeneration of monocots (rice) is described by Hosoyama et al. (Blosci. 
Biotechnol. Biochem. 58: 1 500 (1 994)) and by Ghosh et al. {J. Biotechnol. 32: 1 (1 994)). The nucleic acids of the inven- 
tion can be used to confer desired traits on essentially any plant. 

[2306] Thus, the invention has use over a broad range of plants, including species from the genera Anacardium, 
Arachis, Asparagus, Atropa, Avena, Brassica, Citnjs, CHnjIlus, Capsicum, Carthamus, Cocos, Coffea, Cucumis, Cu- 
curbita, Daucus, Elaeis, Fragaria. Glycine, Gossypium, Helianthus, Heterocallis, Hordeum, Hyoscyamus, Lactuca, 
Linum, Lolium,Lupinus, Lycopersicon, Malus, Manihot, Majorana, Medicago, Nicotiana, Oiea, Oryza, Panieum, Pan- 
nesetum, Persea, Phaseolus, Pistachia, P'^um, Pynjs, Pmnus, Raphanus, Ricinus, Secale, Senecio, Sinap'^l Sola- 
num, Sorghum, Theobromus, Trigonella, Triticum, Vicia, Wis, Vigna, and, Zea. 

[2307] One of skill will recognize that after the expressbn cassette is stably incxDrporated in transgenic plants and 
confirmed to be operable, it can be introduced into other plants by sexual crossing. Any of a number of standard 
breeding techniques can be used, depending upon the species to be crossed 

[2308] The particular sequences of SDFs identified are provided in the attached REF AND SEQ TABLES 1 AND 2. 
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One of ordinary skill In the art. having this data, can obtain cloned DNA fragments, synthetic DNA fragments or polypep- 
tides constituting desired sequences by recombinant methodology known in the art or described herein. 

EXAMPLES 

[2309] The invention Is illustrated by way of the following examples. The invention Is not limited by these examples 
as the scope of the invention is defined solely by the claims following. 

EXAMPLE 1: cDNA PREPARATION 

[2310] A number of the nucleotide sequences disclosed in REF AND SEQ TABLES 1 AND 2 herein as representative 
of the SDFs of the invention can be obtained by sequencing genomic DNA (gDNA) and/or cDNA from com plants 
grown from HYBRID SEED # 35A19. purchased from Pioneer Hi-Bred International. Inc.. Supply Manaaement P.O 
Box 256. Johnston, Iowa 501 31 -0256. ' ' ' 

[2311] A numberof the nucleotide sequences disclosed in REF AND SEQ TABLES 1 AND 2 herein as representative 
of the SDFs of the invention can also be obtained by sequencing genomic DNA from Arabidopsis ttmliana, Wassiiews- 
kija ecotype or by sequencing cDNA obtained from mRNA from such plants as described betow This is a true breeding 
strain. Seeds of the plant are available from the Arabidopsis Biological Resource Center at the Ohb State University 
under the accession number CS2360. Seeds of this plant were deposHed under the terms and conditions of the Bu- 
dapest Treaty at the American Type Culture Collection. Manassas, VA on August 31 , 1999, and were assiqned ATCC 
No. PTA-595. 

[2312] Othermethodsforcloningfull-lengthcDNAaredescribed. for example, by Sekietal..P/a/j/Jouma/15-707-720 
(1998) High-efficiency cloning of Arabidopsis full-length cDNA by bbtinylated Cap trapper"; Maruyama et al . Gem 
i38:171 (1994) Oligo-capping a simple method to replace the cap structure of eukaryotic mRNAs with oligoribonucle- 
otides"; and WO 96/34981 . 

[231 3] Tissues were, or each organ was. individually pulverized and frozen in liquid nitrogen. Next, the samples were 
homogenized in the presence of detergents and then centrif uged. The debris and nuclei were removed from the sample 
and more detergents were added to the sample. The sample was centrif uged and the debris was removed Then the 
sample was applied to a 2M sucrose cushion to isolate polysomes. The RNA was isolated by treatment with detergents 
and proteinase K followed by ethanol precipitation and centrifugation. The polysomal RNA from the different tissues 
was pooled according to the following mass ratios: 15/15/1 for male inflorescences, female inflorescences and root 
respectively The pooled material was then used for cDNA synthesis by the methods described below 
[2314] Starling material for cDNA synthesis for the exemplary com cDNA clones with sequences presented in REF 
AND SEQ TABLES 1 AND 2 was poly(A)-containing polysomal mRNAs from inflorescences and root tissues of com 
plants grown from HYBRID SEED # 35A19. Male inflorescences and female (pre-and post-fertilization) inflorescences 
were isolated at vanous stages of development. Selection for poly(A) containing polysomal RNA was done using oligo 
d(T) cellulose columns, as described by Cox and Goldberg, Plant Molecular Biology: A Practical Approach', pp. 1 -35 
Shaw ed. , c. 1 988 by IRL. Oxford. The quality and the integrity of the polyA+ RNAs were evaluated. 
[2315] Starting material for cDNA synthesis for the exemplary Arabidopsis cDNA clones with sequences presented 
in REF AND SEQ TABLES 1 AND 2 was polysomal RNA isolated from the top-most inflorescence tissues of Arabidopsis 
thaliana Wassilewskija (Ws.) and from roots of Arabidopsis thaliana Landsberg erecta (L. er.), also obtained from the 
Arabidopsis Biological Resource Center. Nine parts inflorescence to every part root was used, as measured by wet 
mass. Tissue was pulverized and exposed to liquid nitrogen. Next, the sample was homogenized in the presence of 
detergents and then centrifuged. The debris and nuclei were removed from the sample and more detergents were 
added to the sample. The sample was centrifuged and the debris was removed and the sample was applied to a 2M 
sucr<»e cushion to isolate polysomal RNA. Cox et al., Plant Molecular Bfology: A Practtoal Approach", pp 1-35 Shaw 
ed.. c. 1988 by IRL, Oxford. The polysomal RNA was used for cDNA synthesis by the methods described "below 
Polysomal mRNA was then isolated as described above for com cDNA. The quality of the RNA was assessed elec- 
trophoretically. 

[231 6] Following preparation of the mRNAs from various tissues as described above, selection of mRNA with intact 
5- ends and specific attachment of an oligonucleotide tag to the 5' end of such mRNA was performed using either a 
chemical or enzymatic approach. Both techniques take advantage of the presence of the cap" structure, which char- 
acterizes the 5" end of most intact mRNAs and which comprises a guanosine generally methylated once at the 7 
positton. 

[2317] The chemical modification approach involves the optional elimination of the 2". 3'<is diol of the 3' terminal 
nbose. the oxidatton of the 2'. 3'-cis diol of the ribose linked to the cap of the 5' ends of the mRNAs into a dialdehyde 
and the coupling of the such obtained dialdehyde to a derivatized oligonucleotide tag. Further detail regarding the 
chemical approaches for obtaining mRNAs having intact 5' ends are disctosed in Internattonal /iipplication No 
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W096/34981 published November 7. 1996. 

[2318] The enzymatic approach for ligating the oligonucleotide tag to the intact 5' ends of mRNAs involves the removal 
of the phosphate groups present on the 5' ends of uncapped incomplete mRNAs, the subsequent decapping of mRNAs 
having intact 5' ends and the ligation of the phosphate present at the 5' end of the decapped mRNA to an oligonucleotide 
teg. Further deteil regarding the enzymatic approaches for obtaining mRNAs having intact 5' ends are disclosed In 
Dumas Milne Edwards J.B. (Doctoral Thesis of Paris VI University. Le clonage des ADNc complets: difficultds et per- 
spectres nouvelles^ Apports pour P§tude de la regulation de I'expressbn de la tryptophane hydroxylase de rat, 20 
Dec.1993), EPO 625572 and Katoef a/.. GensJ50:243-250 (1994). 

[2319] In both the chemical and the enzymatic approach, the oligonucleotide teg has a restriction enzyme site (e q 
an EcoRI site) therein to facilitate later cloning procedures. Following attechment of the oligonucleotide teg to the 
mRNA, the integrity of the mRNA is examined by performing a Northern blot using a probe complementery to the 
oligonucleotide tag. r 

"'^^'^^ '° oligonucleotide tags using either the chemical or the enzymatic method, first strand 
cDNA synthesis is perfomied using an oligoKlT primer with reverse transcriptese. This oligoKlT primer can contein an 
internal teg of at least 4 nucleotides, which can be different from one mRNA preparation to another. Methylated dCTP 
IS used for cDNA first strand synthesis to protect the internal EcoRI sites from digestion during subsequent steps The 
first strand cDNA is preciprtated using isopropanol after removal of RNA by alkaline hydrolysis to eliminate residual 
primers. 

[2321] Second strand cDNA synthesis is conducted using a DNA polymerase, such as Klenow fragment and a primer 
oorr^ponding to the 5' end of the ligated oligonucleotide. The primer is typically 20-25 bases in length. Methylated 
dCTP IS used for second strand synthesis in order to protect internal EcoRI sites in the cDNA from digestion durinq 
the cloning process. 

[2322] Following second strand synthesis, the full-length cDNAs are cloned into a phagemid vector, such as pBlue- 
Scnpt™ (Stratagene). The ends of the full-length cDNAs are blunted with T4 DNA polymerase (Biolabs) and the cDNA 
is digested wrth EcoRI. Since methylated dCTP is used during cDNA synthesis, the EcoRI site present in the teg is the 
only hemi-methylated site; hence the only site susceptible to EcoRI digestion. In some instances, to facilitate subclon- 
ing, an Hind III adapter is added to the 3' end of full-length cDNAs. 

[2323] The full-length cDNAs are then size fractionated using either exclusion chromatography (AcA Biosepra) or 
electrophoretic separation which yields 3 to 6 different fractbns. The full-length cDNAs are then directionally cloned 
erther irito pBlueScript™ using either the EcoRI and Smal restriction sites or. when the Hind III adapter is present in 
he full-length cDNAs, the EcoRI and Hind III restriction sites. The ligation mixture is transformed, preferably by elec- 
troporation, into bacteria, which are then propagated under appropriate antibiotic selection. 
[2324] Clones conteining the oligonucleotide teg attached to full-length cDNAs are selected as follows 
[232^ The plasmid cDNA libraries made as described above are purified (e.g. by a column available from Qiagen) 
A positive selection of the tagged clones is performed as follows. Briefly in this selection procedure the plasmid DNA 
IS converted to single stranded DNA using phage Fl gene II endonuclease in combination with an exonuclease (Chanq 
et al., Ge/7eJ2Z:95 (1993)) such as exonuclease III or T7 gene 6 exonuclease. The resulting single stranded DNA is 
then purified using paramagnetic beads as described by Fry et al.. Biotechniques 13: 124 (1992). Here the sinole 
s randed DNA is hybridized with a biotinylated oligonucleotide having a sequence corresponding to the 3" end of the 
oligonucleotide tag. Preferably, the primer has a length of 20-25 bases. Clones including a sequence complementary 
to the biotinylated oligonucleotide are selected by incubation with streptavidin coated magnetic beads followed by 
magnetic capture. After capture of the positive clones, the plasmid DNA is released from the magnetic beads and 

converted intodoublestranded DNA usingaDNApolymerasesuch as ThermoSequenase™(obteinedfromAmer8ham 
Pharrnacia Biotech). Alternatively, protocols such as the Gene Trapper™ kit (Gibco BRL) can be used The double 
stranded DNA is then transtomied. preferably by electroporation, into bacteria. The percentage of positive clones 
having the 5' teg oligonucleotide is typically estimated to be between 90 and 98% from dot blot analysis 
[2326] Following transformation, the libraries are ordered in microtiter plates and sequenced. The Ambidopsis library 
was deposited at the American Type Culture Collection on January 7, 2000 as E-co//liba 010600- under the accession 
number PTA-1161. 



EXAMPLE 2: SOUTHERN HYBRIDIZATIONS 

[2327] The SDFs of the invention can be used in Southern hybridizations as described above. The following describes 
extraction of DNA from nuclei of plant cells, digestion of the nuclear DNA and separation by length transfer of the 
separated fragments to membranes, preparation of probes for hybridization, hybridization and detectton of the hybrid- 

ized probe. ' 

K!!^ ^® procedures described herein can be used to isolate related polynucleotides or tor diagnostic purposes 
Moderate stnngency hybridization conditions, as defined above, are described in the present example. These coridi- 
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tions result in detectbn of hybridization between sequences having at least 70% sequence identity. As described above, 
the hybridization and wash conditions can be changed to reflect the desired percentage of sequence identity between 
probe and target sequences that can be detected. 

[2329] In the following procedure, a probe for hybridization is produced from two PGR reactions using two primers 
from genomic sequence of Ambidopsis thaliana. As described above, the particular template for generating the probe 
can be any desired template. 

[2330] The first PGR product is assessed to validate the size of the primer to assure it is of the expected size. Then 
the product of the first PGR is used as a template, with the same pair of primers used in the first PGR, in a second 
PGR that produces a labeled product used as the probe. 

[2331] Fragments detected by hybridization, or other bands of interest, can be isolated from gels used to separate 
genomic DNA fragments by known methods for further purification and/or characterization. 

Buffers for nuclear DNA extraction 

[2332] 



1. 10XHB 





1(K)0ml 




40 mM spermidine 


10.2 g 


Spermine (Sigma S-2876) and spemiidine (Sigma S-2501) 


10 mM spermine 


3.5 g 


Stabilize chromatin and the nuclear membrane 


0.1 M EDTA (disodium) 


37.2 g 


EDTA inhibits nuclease 


0.1 MTris 


12.1 g 


Buffer 


0.8 KCI 


59.6 g 


Adjusts ionic strength for stability of nuclei 


Adjust pH to 9.5 with 10 N NaOH. It appears that there is a nuclease present in leaves. Use of pH 9 5 appears 
to inactivate this nuclease. 



2. 2 M sucrose (684 g per 1000 ml) 

Heat about half the final volume of water to about SO'^G. Add the sucrose slowly then bring the mixture to close 
to final volume; stir constantly until it has dissolved. Bring the solution to volume. 



3. Sarkosyl solution (lyses nuclear membranes) 





1000 ml 


N-lauroyI sarcosine (Sari^osyl) 
0.1 MTris 

0.04 M EDTA (Disodium) 


20.0 g 

12.1 g 
14.9 g 


Adjust the pH to 9.5 after all the components are dissolved and bring up to the proper volume 



4. 20% Triton X-100 

80 ml Triton X-100 

320 ml 1xHB (w/o p-ME and PMSF) 

Prepare in advance; Triton takes some time to dissolve 

A. Procedure 
[2333] 

1. Prepare IX H" buffer (keep ice-cokJ during use) 





1000 ml 


10X HB 


100 ml 
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(continued) 





1000 ml 


c. IVI bUCrOSo 

Water 


250 ml a non-bnic osmoticum 
634 ml 


Added Just before use: 




100 mM PMSF* 
p-mercaptoethanol 


10 ml a protease inhibitor; protects nuclear membrane proteins 
1 ml inactivates nuclease by reducing disulfide bonds 



•100 mM PMSF 
(phenyl methyl sultonyl lluoride, Sigma P-7626) 
(add 0.0875 g to 5 ml 1 00% ethanoO 



2. Homogenize the tissue in a blender (use 300-400 ml of IxHB per blender). Be sure that you use 5-10 ml of HB 
buffer per gram of tissue. Blenders generate heat so be sure to keep the homogenate cold. It is necessary to put 
the blenders in ice periodically. 

3. Add the 20% Triton X-100 (25 ml per liter of homogenate) and gently stir on ice for 20 min. This lyses plastid 
but not nuclear, membranes. 



4. Filter the tissue suspension through several nylon filters into an ice-cold beaker. The first filtration is through a 
250-micron membrane; the second is through an 85-micron membrane; the third is through a 50-micron membrane- 
and the fourth is through a 20-micron membrane. Use a large funnel to hold the filters. Filtration can be sped ud 
by gently squeezing the liquid through the filters. 

5. Centrifuge the filtrate at 1200 x g for 20 min, at 4''C to pellet the nuclei. 

e^Discard the dark green supernatant. The pellet will have several layers to it. One is starch; it is white and gritty 
The nuclei are gray and soft. In the early steps, there may be a dark green and somewhat viscous layer of chlo- 
roplasts. ^ 

Wash the pellets in about 25 ml cold H buffer (with Triton X-1 00) and resuspend by swirling gently and pipettinq 
After the pellets are resuspended. 

Pellet the nuclei again at 1200 - 1300 x g. Discard the supernatant 

Repeat the wash 3-4 times until the supernatant has changed from a dark green to a pale green This usually 
happens after 3 or 4 resuspensions. At this point, the pellet is typically grayish white and very slippery. TTne 
Triton X-100 in these repeated steps helps to destroy the chloroplasts and mitochondria that contaminate the 
prep. Resuspend the nuclei for a final time in a total of 1 5 ml of H buffer and transfer the suspension to a sterile 
125 ml Erlenmeyer flask. 

7. Add 15 ml. dropwise, cold 2% Sarkosyl, 0.1 M Tris. 0.04 M EDTA solution (pH 9.5) while swirling gently This 
lyses the nuclei. The solution will become very viscous. 

8. Add 30 grams of CsCI and gently swirl at room temperature until the CsCI is in solution. The mixture will be qrav 
white and viscous. ^' 



9. Centrifuge thesolution at 11.400x gat4"Cforat least 30min. The longer this spin is. thefirmerthe protein pellicle. 

10. The result is typically a clear green supernatant over a white pellet, and (perhaps) under a protein pellicle 
Carefully remove the solution under the protein pellicle and above the pellet. Determine the density of the solution 
by weighing 1 ml of solution and add CsCI if necessary to bring to 1 .57 g/ml. The solution contains dissolved solids 
(sucrose etc) and the refractive index akine will not be an accurate guide to CsCI concentration. 

11. Add 20 Hi of 10 mg/ml EtBr per ml of solution. 
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12. Centrifuge at 184,000 x g for 16 to 20 hours in a fixed-angle rotor. 

13. Remove the dark red supernatant that is at the top of the tube with a plastic transfer pipette and discard 
Carefully renwve the DNA band with another transfer pipette. The DNA band is usually visible in room light- oth- 
erwise, use a long wave UV light to locate the band. 

14. Extract the ethidium bromide with isopropanol saturated with water and salt. Once the solution is clear, extract 
at least two nrmre times to ensure that all of the EtBr is gone. Be very gentle, as it is very easy to shear the DNA 
at this step. This extraction may take a while because the DIMA solution tends to be very viscous If the solution is 
too viscous, dilute it with TE. 

15. Dialyze the DNA for at least two days against several changes (at least three times) of TE (10 mM Tris 1 mM 
EDTA, pH 8) to remove the cesium chloride. 

16. Remove the dialyzed DNA from the tubing. If the dialyzed DNA solution contains a lot of debris, centrifuge the 
DNA solution at least at 2500 x g for 10 min. and carefully transfer the clear supernatant to a new tube Read the 
A260 concentration of the DNA. 

17. Assess the quality of the DNA by agarose gel electrophoresis (1% agarose gel) of the DNA. Load 50 ng and 
100 ng (based on the OD reading) and compare it with known and good quality DNA. Undigested lambda DNA 
and a lambda-Hindlll-digested DNA are good molecular weight makers. 

Protocol for Digestion of Genomic DNA 

Protocol: 
[2334] 

1 The relative amounts of DNA for different crop plants that provide approximately a balanced number of genome 
equivalent is given in Table 3. Note that due to the size of the wheat genome, wheat DNA will be underrepresented 
Lambda DNA provides a useful control tor complete digestran. 

2. Precipitate the DNA by adding 3 volumes of 100% ethanol. Incubate at -20»C for at least two hours Yeast DNA 
can be purchased and made up at the necessary concentration, therefore no precipitation is necessary for yeast 
DNA. ' 

3. Centrifuge the solutfon at 1 1 .400 x g for 20 min. Decant the ethanol carefully (be careful not to disturb the pellet) 
Be sure that the residual ethanol is completely removed either by vacuum desiccation or by carefully wioina the 
sides of the tubes with a clean tissue. 

4. Resuspend the pellet in an appropriate volume of water Be sure the pellet is fully resuspended before proceedina 
to the next step. This may take about 30 min. 

5. Add the appropriate volume of 10X reaction buffer provkJed by the manufacturer of the restriction enzyme to 
the resuspended DNA followed by the appropriate volume of enzymes. Be sure to mix it properly by slowly swirlinq 
the tubes. >- * / j » 

6. Set-up the lambda digestion-control for each DNA that you are digesting. 

7. Incubate both the experimental and lambda digests overnight at 37»C. Spin down condensation in a mterofuge 
before proceeding. 

8. After digestion, add 2 ^li of toading dye (typically 0.25% bromophenol blue. 0.25% xylene cyanol in 15% Ficoll 
or 30% glycerol) to the lambda-control digests and load in 1% TPE-agarose gel (TPE is 90 mM Tris-phosphate 2 
mM EDTA, pH 8). If the lambda DNA in the lambda control digests are completely digested, proceed with the 
precipitation of the genomic DNA in the digests. 

9. Precipitate the digested DNA by adding 3 volumes of 100% ethanol and incubating in -20»C for at least 2 hours 
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(preferably overnight). 

EXCEPTION: Arabidopsis and yeast DNA are digested In an appropriate volume; they doni have to be precipitated. 

10 Resuspend the DNA in an appropriate volume of TE (e.g., 22 jilx 50 blots = IIOOnl) and an appropriate volume 
of 10X loading dye (e.g.. 2.4 m x 50 blots = 1 20 ^1). Be careful in pipetting the loading dye - it is viscous Be sure 
you are pipetting the con-ect volume. 



Table 3 



Some guide p 


oints in digesting 


genomic DNA. 


Species 


Genome Size 


Size Reiative to 
Arabidopsis 


Genome Equivaient to 


Amount of DNA per 

DIOI 


Arabidopsis 


120 IVIb 


IX 


IX 




Brassica 


1.100 Mb 


9.2X 


0.54X 


lO^ig 


Corn 


2,800 Mb 


2a3X 


0.43X 


20 ^ig 


Cotton 


2.300 Mb 


19.2X 


0.52X 


20 Jig 


Oat 


11.300 Mb 


94X 


0.11X 


20 ^g 


Rice 


400 Mb 


3.3X 


0.75X 


5Mg 


Soybean 


1.100 Mb 


9.2X 


0.54X 


10 ^g 


Sugarbeet 


758 Mb 


6.3X 


0.8X 


10 Jig 


Sweetclover 


1,100 Mb 


9.2X 


0.54X 


lOng 


Wheat 


16,000 Mb 


133X 


0.08X 


20 ^g 


Yeast 


15 Mb 


0.1 2X 


IX 


0.25 ng 



Protocol for Southern Blot Analysis 

[2335] The digested DNA samples are electrophoresed in 1 % agarose gels in Ix TPE buffer. Low voltage- overnight 
separations are preferred. The gels are stained with EtBr and photographed. ' 

1 . For blotting the gels, first incubate the gel in 0.25 N HCI (with gentle shaking) for about 1 5 min. 

^' J1T.5['^I'.V'"^! denatured by 2 incubations. Incubate (with shaking) in 0.5 M NaOH 

in 1 .5 M NaCI for 1 5 min. 

?• JTf iiV^®" ""^'"^ ""^^^ neutralized by incubating twice (with shaking) in 1 .5 M Tris pH 7 5 in 
1 .5 M NaCI for 15 min. 

4. A nylon rnembrane is prepared by soaking it in water for at least 5 min, then in 6X SSC for at least 15 min 
before use. (20x SSC is 175.3 g NaCI. 88.2 g sodium citrate per liter, adjusted to pH 7.0.) 

5. The nylon membrane is placed on top of the gel and all bubbles in between are removed. The DNA is blotted 
from the gel to the membrane using an absorbent medium, such as paper toweling and 6x SCC buffer. After the 
transfer, the membrane may be lightly brushed with a gloved hand to remove any agarose sticking to the surface. 

6. The DNA is then fixed to the membrane by UV crosslinking and baking at 80°C. The membrane is stored at 4»C 

until use. 
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B. Protocol for PGR Am plification of Genomic Fraoments in Arabidopsis 

Amplrficatton procedures: 

[2336] 

1 . Mix the foltowing in a 0.20 ml PGR tube or 96-well PGR plate: 



Volume 


Stock. 


Final Amount or Cone. 


r\ c 111 

U.5 


-10 r\g/[i\ genomic DNA^ 


5 ng 


2.5^ 


10X PGR buffer 


20 mM Tris. 50 mM KGI 


0.75^1 


50 mM MgGIg 


1.5 mM 


1^1 


10 pmol/fil Primer 1 (Forward) 


10 pmol 


1^1 


10 pmoi/|il Primer 2 (Reverse) 


10 pmol 


0.5^1 


5 mM dNTPs 


0.1 mM 


0.1 Hi 


5 units/^l Platinum Taq" (Life Technologies, Gaithersburg, MD) DNA 
Polymerase 


1 units 


(to 25 ^il) 


Water 





2. The template DNA is amplified using a Perkin Elmer 9700 PGR machine: 



1) 94**G for 10 min. followed by 


2) 


3) 


4) 


5 cycles: 


5 cycles: 


25 cycles: 


94 '^C - 30 sec 
62 - 30 sec 
72 *»G - 3 min 


94 **C - 30 sec 
58 '^G - 30 sec 
72 *»G - 3 min 


94 *»C - 30 sec 
53 »G - 30 sec 
72 «G - 3 min 


5) 72'G for 7 min. Then the reactions are stopped by chilling to 4**C. 



[2337] The procedure can be adapted to a multi-well format If necessary. 

Quantification and Dilution of PCR Products: 

[2338] 

1 . The product of the PCR is analyzed by electrophoresis in a 1% agarose gel. A linearized plasmid DNA can be 
used as a quantification standard (usually at 50. 100. 200, and 400 ng). These will be used as references to 
approximate the amount of PCR products. Hindlll-digested Lambda DNA is useful as a molecular weight marker 
The gel can be run fairly quickly; e.g.. at 100 volts. The standard gel is examined to determine that the size of the 
PGR products IS consistent with the expected size and if there are significant extra bands or smeary products in 
the PCR reactions. 

2. The amounts of PCR products can be estimated on the basis of the plasmid standard. 

3. For the small number of reactions that produce extraneous bands, a small amount of DNA from bands with the 
correct size can be isolated by dipping a sterile lOiJl tip into the band while viewing though a UV Transilluminator 
The small amount of agarose gel (with the DNA fragment) is used in the labeling reaction 
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C. Protocol for PCR-DIG- Labeling of DNA 

Solutbns: 



[2339] 



Reagents in PGR reactions (diluted PCRproducts, 10X PGR Buffer. 50 mM MgGt 5 U/m Platinum Taq Polymerase 
and the primers) ^ y . 

1 0X dNTP + DIG-1 1 ^UTP [1 :51: (2 mM dATR 2 mM dCTR 2 mM dGTR 1 .65 mM dTTR 0.35 mM DIG-1 1 <JUTP) 
10X dNTP + DIG-11^UTP [1:10]: (2 mM dATR 2 mM dGTR 2 mM dGTR 1.81 mM dTTR 0.19 mM DIG-11-dUTP) 

10XdNTP + DIG.11^UTP[1:15]:(2mMdATR2mMdCTR2mMdGTR1.875mMdTTR0.125mMDlG-11-dUTP) 
TE buffer (10 mM Tris, 1 mM EDTA, pH 8) 

Maleate buffer: In 700 ml of deionized distilled water, dissolve 1 1 .61 g maleic acid and 8.77 g NaGI Add NaOH to 
adjust the pH to 7.5. Bring the volume to 1 L Stir for 1 5 min. and sterilize. 

10% blocking solution: In 80 ml deionized distilled water, dissolve 1.16g maleic acid. Next, add NaOH to adjust 
the pH to 7.5, Add 10 g of the blocking reagent powder (Boehringer Mannheim, Indianapolis, IN. Gat no 1096176) 
Heat to 60»G while stirring to dissolve the powder. Adjust the volume to 1 00 ml wfth water Stir and sterilize. 

1% blocking solution: Dilute the 10% stock to 1% using the maleate buffer. 

Buffer 3 (100 mM Tris. 100 mM NaGI. 50 mM MgGI^, pH9.5). Prepared from autoclaved solutions of 1M Tris pH 
9.5, 5 M NaGI. and 1 M MgGIg in autoclaved distilled water. 



Procedure: 



[2340] 

1 . PGR reactions are performed in 25 ^ volumes containing: 



PGR buffer 


IX 


MgCl2 


1.5 mM 


10X dNTP + DIG-1 l-dUTP 


IX (please see the note below) 


Platinum Taq™ Polymerase 


1 unit 


10 pg probe DNA 




10 pmol primer 1 





Note: 





Use for: 


10X dNTP + DIG-1 1-dUTP (1:5) 


<1 kb 


1 0X dNTP + DIG-1 1 -dUTP (1:10) 
1 0X dNTP + DIG-1 1 -dUTP (1:15) 


1 kbtol.Skb 
> 1.8 kb 



2. The PGR reaction uses the following amplification cycles: 
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10 



IS 



20 



25 



30 



35 



40 



45 



1)94°C for 10 min. 


2) 


3) 


4) 


5 cycles: 


5 cycles: 


25 cycles: 


95*»C - 30 sec 
6rc-1 min 
73*»C - 5 min 


95''C ' 30 sec 
59'*C - 1 min 
75'*C - 5 min 


95<»C - 30 sec 
src- 1 min 
73"C - 5 min 


5) 72*0 for 8 min. The reactions are temiinated by chilling to 4**C (hold). 



3. The products are analyzed by electrophoresis- in a 1% agarose gel. comparing to an aliquot of the unlabelled 
probe starting material. 

4. The amount of DIG-labeled probe is determined as follows: 

Make serial dilutions of the diluted control DNA in dilution buffer (TE: 10 mM Tris and 1 mM EDTA dH 8^ as 
shown in the following table: ' 



DIG-labeled control DNA starting cone. 


Stepwise Dilution 


Final Cone. (Dilution Name) 


5ng/^l 


1 Rl In 49 Hi TE 


100 pg/Hl(A) 


lOOpg/^l (A) 


25 ^1 in 25 ^tl TE 


50 pg/^l (B) 


50 pg/^l (B) 


25 ^1 in 25 \i\ TE 


25 pg/ix\ (C) 


25 pg/^l (C) 


20 ^1 in30jil TE 


10pg/Hl(D) 



Serial deletions of a DIG-labeled standard DNA ranging from 100 pg to 10 pg are spotted onto a positively 
Charged nylon nnembrane, nrarking the membrane lightly with a pencil to identify each dilution. 

b. Serial dilutions (e.g., 1:50. 1:2500, 1:10,000) of the newly labeled DNA probe are spotted. 

c. The membrane is fixed by UV crosslinking. 

d. The membrane is wetted with a small amount of maleate buffer and then incubated in 1% blocking solution 
for 1 5 min at room temp. 

e. The labeled DNA is then detected using alkaline phosphatase conjugated antl-DIG antibody (Boehringer 
Mannheim, Indianapolis, IN. cat. no. 1093274) and an NBT substrate according to the manufacture's instruc- 

tion. 

f Spot intensities of the control and experimental dilutions are then compared to estimate the concentration 
of the PCR-DIG-labeled probe. 

D. Prehybrldlzatlon and Hybridization of Southern Blots 

Solutbns: 



[2341] 



100% Formamide 


purchased from Gibco 


20XSSC 


(IX = 0.15 M NaCI, 0.015 M Naacitrate) 


per L: 


175gNaCI 




87.5 g NagCitrate^HgO 



20% Sarkosyl (N-lauroyl-sarcosine) 
20% SOS (sodium dodecyl sulphate) 
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10% Blocking Reagent: In 80 ml debnized distilled water, dissolve 1.16 g maleic acid. Next, add NaOH to adjust 
the pH to 7.5. Add 10 g of the blocking reagent powder. Heat to eO'C while stirring to dissolve the powder. Adjust 
the volume to 100 ml with water. Stir and sterilize. 



Prehybridization Mix: 


Final Concentration 


Components 


Volume (per 100 ml) 


Stock 


50% 


Formamide 


50 ml 


100% 


5X 


SSC 


25 ml 


20X 


0.1% 


Sarkosyl 


0.5 ml 


20% 


0.02% 


SDS 


0.1 ml 


20% 


2% 


Blocking Reagent 


20 ml 


10% 




Water 


4.4 ml 





General Procedures: 
[2342] 

1 . Place the blot in a heat-sealable plastic bag and add an appropriate volume of prehybridization solution (30 ml/ 
100cm2) at room temperature. Seal the bag with a heat sealer, avoiding bubbles as much as possible Lay down 
the bags in a large plastic tray (one tray can accommodate at least 4-5 bags). Ensure that the bags are lying flat 
in the tray so that the prehybridization solution is evenly distributed throughout the bag. Incubate the blot for at 
least 2 hours with gentle agitation using a waver shaker. 

2. Denature DIG-labeied DNA probe by incubating for 10 min. at 98»C using the PGR machine and immediately 
cool it to 4=*C. ^ 

3. Add probe to prehybridization solution (25 ng/ml; 30 ml = 750 ng total probe) and mix well but avoid foaming 
Bubbles may lead to background. 

4. Pour off the prehybridization solution from the hybridization bags and add new prehybridization and probe so- 
lution mixture to the bags containing the membrane. 

5. Incubate with gentle agitation for at least 16 hours, 

6. Proceed to medium stringency post-hybridization wash: 

Three times for 20 min. each with gentle agitation using IX SSC, 1% SDS at 60*»C. 

All wash solutions must be prewamned to 60»C. Use about 100 ml of wash solution per membrane. 

To avoid background keep the membranes fully submerged to avoid drying in spots; agitate sufficiently to 
avoid having membranes stick to one another. 

7. After the wash, proceed to immunological detection and CSPD development. 
E. Procedure for Immunological Detection with CSPD 

Solutbns: 
[2343] 

^^^^ ^ ' l^aleic acid buffer (0.1 M maleic acid, 0.15 M NaCI; adjusted to pH 7.5 with NaoH) 
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Washing buffer Maleic acid buffer with 0.3% (v/v) Tween 20. 

Blocking stock solution 10% blocking reagent In buffer 1. Dissolve (10X concentration): blocking reagent pow- 

der (Bc^hringer Mannheim. Indianapolis. IN. cat. no. 1096176) by constantly stirring 
on a 65°C heating block or heat in a microwave, autoclave and store at 4'C. 

Buffer 2 

(1X blocking solution): Dilute the stock solution 1 :10 in Buffer 1 . 

Detection buffer 0.1 M Iris, 0.1 M NaCI, pH 9.5 

Procedure: 
[2344] 

1^ After the post-hybridization wash the blots are briefly rinsed (1 -5 min.) in the maleate washing buffer with qentle 
snaking. ^ 

2. Then the membranes are incubated for 30 min. in Buffer 2 with gentle shaking. 

3. Anti-DIG-AP conjugate (Boehringer Mannheim, Indianapolis. IN. cat. no. 1093274) at 75 mU/ml (VIO 000) In 
Buffer 2 is used for detection. 75 ml of solution can be used for 3 blots. 

4. The membrane is incubated for 30 min. in the antibody solution with gentle shaking. 

5 The membrane are washed twice in washing buffer with gentle shaking. About 250 mis is used per wash for 3 
blots. 

6. The blots are equilibrated for 2-5 min in 60 ml detection buffer. 

T^Dilute CSPD (1 :200) In detection buffer (This can be prepared ahead of time and stored in the dark at 4°C) 
The following steps must be done individually. Bags (one for detection and one for exposure) are generally cut 
and ready before doing the following steps. amnyt-ui 

a TTie blot is carefully removed from the detection buffer and excess liquid removed without drying the membrane 
The blot IS immediately placed in a bag and 1 .5 ml of CSPD solution is added. The CSPD solution can be spread 
over the membrane. Bubbles present at the edge and on the surface of the blot are typically removed by gentle 
rubbing. The membrane is incubated for 5 min. in CSPD solution. 

9 Excess liquid is removed and the membrane is blotted briefly (DNA side up) on Whatman 3MM paper Do not 
let the membrane dry completely 

10. Seal the damp membrane in a hybridization bag and incubate for 10 min at 37»C to enhance the luminescent 

reaction. 

11. Exposefor2hour8atroomtemperaturetoX-rayfilm. Multiple exposures can betaken. Luminescence continues 
tor at least 24 hours and signal intensity increases during the first hours. 

Example 3: Transformation of Carrot Cells 

[2345] Transfomiation of plant cells can be accomplished by a number of methods, as described above Similarly 
a number of plant genera can be regenerated from tissue culture following transformation. Transformation and reqen- 
eration of carrot cells as described herein is illustrative. 

[23461 Single cell suspension cultures of carrot {Daucus carota) cells are established from hypocotyls of cultivar 
Early Nantes in B5 growth medium (O.L. Gamborg et al.. Plant Physiol. 45:372 (1970)) plus 2.4-D and 15 mM CaCU 
(85 -44 medium) by methods known in the art. The suspensbn cultures are subcultured by adding 10 ml of the sus- 
pension culture to 40 ml of ^-44 medium in 250 ml flasks eveiy 7 days and are maintained In a shaker at 1 50 rpm at 
df o in the dark. 
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fi^S-iS MOQfl? R »;^"sto"T,ed with exogenous DNA as described by Z. Chen et al. Plant Mol. 

t f « ? oi l ■ ^' '^^^ post-subculture cells are incubated with cell wall digestion solution containing 0 4 
M so*rtol. 2% dnselase. 5mM MES (2-[N-Morpholino] ethanesulfonic acid) pH 5.0 for 5 hours. The digested cells are 
peUeted gently at 60 xg for 5 min. and washed twice in W5 solution containing 1 54 mM NaCI. 5 mM KCI. 1 25 mM CaCU 
and SmM 9lu«x.se pH 6.0. The protoplasts are suspended in MC solution containing 5 mM MES. 20 mM CaCU 0 1 
M mannitol. pH 5.7 and the protoplast density is adjusted to about 4 x 1 06 protoplasts per ml 
[234q 15-60 ^g of plasmid DNA is mixed with 0.9 ml of protoplasts. The resulting suspension is mixed wHh 40% 
povetfjyiene glycol (MW 8000. PEG 8000). by gentle inversion a few times at room temperature f' s to 25 ^SI 

llTZTrr ""^ ^''"'^ PEG-DNA-protoplast mixture. Protoplasts are incubated 

in the culture medium for 24 hour to 5 days and cell extracts can be used for assay of transient expression of the 
introduced gene. Alternatively, transformed cells can be used to produce transgenic callus, which in tum can be used 

98^991 (1 985). IdenMicat.on and Isolation of Single Cells that Produce Somatic Embryos in Carrot Suspension aJh 

.H^^ T^'T "J*"^ described, it will be apparent to one of orelinary skill in the art that various modifica- 
^ons of the matenals and methods for practicing the invention can be made. Such modifications are to be considered 
within the scope of the invention as defined by the following claims. 

[2350] Each of the references from the patent and periodical literature cited herein is hereby expressly incorporated 
in Its entirety by such citation. / r / k 



Claims 

1. An isolated nucleic acid molecule comprising a nucleic acid having a nucleotkle sequence which encodes an amino 
acid sequence exhibiting at least 40% sequence identity to an amino acid sequence encoded by 

(a) a nucleotide sequence described in REF and/or SEQ Table 1 or 2 or a fragment thereof or 

(b) a complement of a nucleotide sequence shown in REF and/or SEQ Table 1 or 2 or a fragment thereof. 

2. An isolated nucleic acid molecule comprising a nucleic acid having a nucleotide sequence which exhibits at least 
b5% sequence identity to 

(a) a nucleotide sequence shown in REF and/or SEQ Table 1 or 2 or a fragment thereof- or 

(b) a complement of a nucleotide sequence shown in REF and/or SEQ Table 1 or 2 or a'fragment thereof. 

3. ^isolated nucleic acid molecule comprising a nucleic acid having a nucleotide sequence which exhibits at least 
OS /o sequence identity to a gene comprising 

(a) a nucleotide sequence shown in REF and/or SEQ Table 1 or 2 or a fragment thereof or 

(b) a complement of a nucleotide sequence shown in REF and/or SEQ Table 1 or 2 or a fragment thereof. 

4. A^ isolated nucl^c acid molecule which is the reverse of the isolated nucleotide sequence according to any one 

ofclaimsl-Jsuch that thereversenucleotidesequencehasasequenceorderwhich is the reverse of thesequence 
order of said isolated nucleotide sequence according to any one of claims 1 -3. 

6. An isolated nucleic acid molecule comprising a nucleic acid capable of hybridizing to a nucleic acid havinq a 
sequence selected from the group consisting of: 

(a) a nucleotide sequence which is shown in REF and/or SEQ Table 1 or 2; and 

(b) a nucleotide sequence which Is complementary to a nucleotide sequence shown in REF and/or SEQ Table 

under conditions that permit formation of a nucleic acid duplex at a temperature from about 40»C and 48»C below 
the melting temperature of the nucleic acid duplex. 

^" frame"'"^'" "^^"'^ according to any one of claims 1 -5. wherein said nucleic acid comprises an open reading 
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7. The isolated nucleic acid molecule of any one of claims 1-5. wherein said nucleic acid is capable of functioning as 
a promoter, a 3' end termination sequence, an untranslated region (UTR), or as a regulatoiy sequence. 

8. The isolated nucleic acid molecule of claim 7. wherein said nucleic acid is a promoter and comprises a sequence 
selected from the group consisting of a TATA box sequence, a CAAT box sequence, a motif of GCAATCG or anv 
transcriptoin-factor binding sequence, and any combination thereof. 

9. The isolated nucleic acid molecule of claim 7. wherein the nucleic acid sequence is a regulatoiy sequence which 
IS capable of promoting seed-specific expression, embryo-specific expression, ovule-specific expression tapetum- 
specific expression or root-specific expression of a sequence or any combination thereof. 

10. A vector construct comprising a nucleic acid molecule according to any one of claims 1-9, wherein said nucleic 
acid molecule is heterologous to any element in said vector construct. 

11. A vector construct according to claim 10 comprising: 

(a) a first nucleic acid having a regulatory sequence capable of causing transcription and/or translation- and 

(b) a second nucleic acid having the sequence of said isolated nucleic acid molecule according to any one of 
claims 1-4; » / 

wherein said first and second nucleic acids are operably linked and wherein said second nucleic acid is heterolo- 
gous to any element in said vector construct. 

12. The vector construct according to claim 1 1 , wherein said first nucleic acid is native to said second nucleic acid. 

13. The vector construct according to claim 11 , wherein said first nucleic acid is heterologous to said second nucleic 

aciu. 

14. A vector construct according to claim 10 comprising: 

(c) a first nucleic acid having having the sequence of said isolated nucleic acid molecule according to claim 
7; and " 

(d) a second nucleic acid; 

wherein said first and second nucleic acids are operably linked and wherein said first nucleic acid is heterologous 
to any element in said vector construct. 

15. The vector construct according to claim 14, wherein said first nucleic acid is native to said second nucleic acid. 

16. The vector construct according to claim 14. wherein said first nucleic acid is heterologous to said second nucleic 

aciQ. 

17. A host cell comprising an isolated nucleic acid molecule according to any one of claims 1 -4, wherein said nucleic 
acid molecule is flanked by exogenous sequence. 

18. A host cell comprising a vector construct of any one of claims 10-16. 

19. An isolated polypeptide comprising an amino ackJ sequence 

(a) exhibiting at least 40% sequence identity of an amino acid sequence encoded by a sequence shown in 
REF and/or SEQ Table 1 or 2 or a fragment thereof; and 

(b) capable of exhibiting at least one of the biological activities of the polypeptide encoded by said nucleotide 
seqence shown in REF and/or SEQ Table 1 or 2 or a fragment thereof. 

20. The isolated polypeptide of claim 19. wherein said amino acid sequence exhibits at least 75% sequence identitv 
to an ammo acid sequence encoded by a sequence shown in SEQ Table 1 or 2 or a fragment thereof. 

21. The isolated polypeptide of claim 19. wherein said amino acid sequence exhibits at least 85% sequence identity 
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to an amino acid sequence encoded by a sequence shown in SEQ Table 1 or 2 or a fragment thereof. 

^' t^L'Sfil!? P^'fP^P*'^" °' ' 9- said amino acid sequence exhibits at least 90% sequence identity 

to an amuno ac.d sequence encoded by a sequence shown in SEQ Table 1 or 2 or a fragment thereof. 

23. An antibody capable of binding the isolated polypeptide of any one of claims 19-22. 

24. A method of introducing an isolated nucleic acid into a host cell comprising: 

(a) providing an isolated nucleic acid molecule according to any one of claims 1-4- and 
ilTntSXot'cer'^ '""^ ""''^ P^"^" °' 

^.!!i"^o°l^T^^ ^ "^'"^ '^^^''"^ ' ^ '^""Struct according to 

26. A method of modulating transcription and/or translation of a nucleic acid in a host cell comprising: 

(a) providing the host cell of claim 24 or 25; and 

(b) culturing said host cell under conditions that permit transcription or translation. 

27. A method for detecting a nucleic acid in a sample which comprises: 

(a) providing an isolated nucleic acid molecule according to any one of claims 1-5- 

Sih'™'l2no!f''! '^.'f ."'',7':;^"= '"^'^^"l^ a sample under condrtions which permit a comparison 
of the sequence of said isolated nucleic acid molecule with the sequence of DNA in said sample- and 

(c) analyzing the result of said comparison. ' 

il^lTr^H-^"""?"?*" '"^'^'^^ ""'^'^"^ ^'^'d -^otecule and said sample are contacted 

under condrtions wh«:h pemiit the formation of a duplex between complementary nucleic ackl sequences 

":r:rrui::r^^^^^^^^^ 

31. Aplantorcellofaplantwhichhasbeentransformedwithanucleicacidmoleculeaccortiingtoanyoneofclaimsl-4. 

32. A plant of cell of a plant which comprises a vector construct according to any one of claims 10-16. 

33. Aplant of cellofaplantwhichhasbeentransfomiedwithavectorconstructaccordingtoanyoneof claims 10-16. 

34. A plant which has been regenerated from a plant cell according to any one of claims 29-33. 
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